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Chapter  1 


Introduction 

Harmonic  analysis  is  a  field  epic  in  both  scope  and  breadth.  Admitting  tech¬ 
niques  that  abut  on  the  edge  of  number  theory  and  providing  the  vehicle  for  many 
sharp  inequalities  in  PDEs  are  a  couple  of  examples  that  illustrate  this  vast  often 
disconnected  portion  of  mathematics.  With  this  being  said,  it  is  not  surprising  that 
connecting  seemingly  disparate  subjects  within  the  framework  of  harmonic  analysis 
can  be  a  feasible  reality.  This  thesis  represents  three  distinct  projects  within  the 
realm  of  pure  and  applied  harmonic  analysis.  Whether  it  be  the  representation, 
compression  and  reproduction  of  DEM  and  LIDAR  data  types  with  wavelet  and 
wedgelet  methods,  dimension  reduction  of  hyperspectral  data  by  frame-based  ker¬ 
nel  methods  or  wavelet  packet  methods  for  the  purposes  of  material  classification 
or  the  analysis  of  wavelet  systems  with  the  multiplicative  Zak  transform,  the  clear 
and  pertinent  underlying  theme  is  the  analysis  of  objects  by  both  time  and  scales. 

DEM  and  LIDAR  are  related  data  types  that  differ  from  images  by  a  great 
deal.  Instead  of  representing  color  reflectance  information  about  a  scene,  DEM 
represents  the  altitude  of  a  surface  in  a  raster  form  and  LIDAR  is  essentially  a 
point  cloud  sampling  of  such  a  surface.  The  goal  of  this  portion  of  the  thesis  is  to 
decompose  these  structures  with  an  intrinsic  transform,  so  that  when  compression  is 
performed,  the  important  underlying  geometries  are  preserved.  A  natural  place  to 
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start  with  such  a  project  is  by  parsing  through  standard  image  processing  techniques 
that  share  similar  goals.  Taking  into  account  the  way  in  which  the  data  differs  from 
images,  and  improving  on  the  techniques  inherited  by  image  processing,  we  can 
establish  a  couple  of  transform  techniques  which  perform  this  process  well.  The 
resulting  transforms,  contourlets  and  wedgelets,  are  part  of  this  larger  view  into  the 
analysis  of  time  and  scale,  and  utilizing  these  structural  components  to  understand 
these  particular  objects. 

Hyperspectral  data  is  a  pure  product  of  harmonic  analysis.  Nothing  could 
be  more  classical  than  spectral  information  of  this  sort.  While  the  methods  for 
the  reduction  of  dimension  of  general  data  sets  has  not  always  existed  exclusively 
within  the  world  of  harmonic  analysis,  the  reduction  of  dimension  of  this  particular 
type  of  data  can  certainly  be  considered  loosely  part  of  the  family.  Frame  based 
kernel  methods  for  dimension  reduction  of  hyperspectral  data  can  be  seen  in  the 
light  of  time-scale  analysis  through  the  diffusion  wavelet  viewpoint.  The  methods 
in  this  thesis  do  not  cover  this  perspective,  but  the  alternate  analysis  of  a  spatial- 
spectral  decomposition  of  hyperspectral  data  through  the  use  of  wavelet  packets 
clearly  fulfills  this  condition.  The  end  result  in  any  case  is  a  reduced  dimension 
data  cube  for  the  express  purpose  of  material  classification.  Both  of  the  methods 
above  are  shown  to  be  superior  to  classification  of  the  raw  data  by  a  collection  of 
standard  techniques. 

Wavelets  themselves  are  a  personification  of  time-scale  analysis.  Ever  since 
Calderon  showed  the  world  that  we  could  represent  with  scale  via  the  affine  group,  we 
have  been  honing  our  ability  to  generate  and  utilize  these  wavelet  systems.  It  is  this 
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same  scale  property  that  makes  the  affine  group  such  a  confounding  group  to  work 
with,  so  much  so  that  replicating  the  actions  of  the  Zak  transform  in  the  Heisenberg 
group  case  for  Gabor  systems  is  neither  simple  nor  straightforward.  However,  an 
idea  in  the  form  of  the  multiplicative  Zak  transform  yields  a  tantalizing  clue  to  the 
reproduction  of  this  process,  and  allows  us  the  ability  to  work  on  familiar  ground 
with  the  troublesome  affine  group. 

1.1  Original  Elements  of  the  Thesis 

1.1.1  Chapter  Two 

Chapter  two  concerns  the  evaluation  of  many  standard  and  modern  multi-scale 
wavelet-like  representation  schemes  for  DEM/LIDAR  data.  An  innovation  within 
this  framework  was  the  removal  of  luminance  from  the  calculation  of  TSSIM,  to 
more  aptly  describe  the  DEM  data  type. 

Another  innovation  within  this  chapter  concerns  the  structures  used  to  im¬ 
plement  the  modified  wedgelet  algorithms.  These  were  sped  up  many  times  from 
their  basic  Beamlab  levels  merely  by  using  efficient  structures  tailor  built  for  the 
process.  The  recursive  decomposition  scheme  was  created  to  minimize  the  number 
of  redundant  calculations. 

The  largest  contribution  in  this  chapter  consists  of  extending  the  wedgelet 
transform  to  working  on  raw  LIDAR  data.  This  generalization  of  the  standard  linear 
wedgelet  transform  works  on  gridded  information  and  non-uniform  data  points.  This 
effectively  allows  for  a  multitude  of  new  data  types  as  well  as  providing  a  flexible 
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utility  for  gridding  LIDAR  data,  and  removing  gaussian  and  salt-and-pepper  noise, 
when  that  noise  occurs  on  large  structural  elements,  all  at  the  same  time. 

All  of  these  methods  are  novel  when  used  on  DEM  data,  as  many  people  have 
not  utilized  the  fact  that  DEM  and  image  data  behave  very  distinctly  from  one 
another. 

1.1.2  Chapter  Three 

This  chapter  is  interesting  for  a  number  of  reasons.  The  pipeline  used  to 
implement  frame  based  kernel  methods  for  dimension  reduction  for  the  purpose  of 
material  classification  are  all  elements  that  have  been  used  apart  from  one  another 
for  a  very  long  time.  There  are  two  shared  contributions  within  this  context.  The 
first  is  that  by  ordering  the  flow  of  the  pipeline  in  a  particular  directions,  we  have 
in  effect  produced  a  new  kernel  for  dimension  reduction. 

The  second  novel  process  is  the  application  of  frames  to  replace  the  aging  con¬ 
cept  of  endmembers.  The  conceptual  difference  allows  for  a  great  deal  of  flexibility 
when  it  comes  time  to  classify  the  data,  as  frames  are  essentially  designable. 

1.1.3  Chapter  Four 

The  multiplicative  Zak  transform  has  been  around  now  for  over  a  decade. 
However,  it  hasn’t  really  progressed  from  the  original  result  found  by  Gertner  et 
al.  [35].  Part  of  the  problem  with  wavelets  is  that  they  aren’t  much  of  a  problem 
at  all.  Wavelets  tend  to  want  to  span  L2(3R),  much  more  often  than  an  average 
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function  g  could  create  a  spanning  Gabor  system. 

One  addition  to  this  ontlook  is  the  introduction  of  superframes  as  an  attempt 
to  glue  the  two  Hardy  spaces  back  together  so  that  actual  wavelet  sets  may  be 
analyzed.  Unfortunately  this  result  has  only  been  shown  to  be  true  in  one  direction, 
namely,  that  of  all  wavelet  frames  consisting  of  a  two-part  wavelet-superframe.  This 
result  is  proven  in  Theorem  7. 

The  idea  of  analyzing  the  multiplicative  Zak  transform  on  wavelet  sets  is  also  a 
new  one.  The  important  structure  in  use  is  the  dilation-partition  of  the  real-line  by  a 
wavelet  set.  This  preserves  all  of  the  important  properties  of  the  transform  itself,  and 
allows  us  to  shift  our  view  to  various  sets,  where  functions  have  the  splitting  property 
with  their  wavelet  generators.  This  leads  to  a  slightly  more  general  classification 
of  wavelet  frames,  one  that  allows  support  across  zero  in  R,  allowing  for  genuine 
wavelet  frames.  This  occurs  in  Theorem  9. 

Theorem  11  illustrates  a  very  provocative  relationship  between  the  multiplica¬ 
tive  Zak  transform,  and  the  alternate  version  of  that  transform,  in  relation  to  the 
sum  of  all  dilates  of  the  generating  function. 

The  final  innovation  of  this  thesis  is  the  inversion  formula  for  the  multiplicative 
Zak  transform  given  in  theorem  12.  Once  the  other  elements  from  the  paper  (such 
as  the  inverse  of  Theorem  7)  have  been  worked  out,  this  will  provide  a  simple 
mechanism  to  create  families  of  wavelet  frames  with  designed  properties. 
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Chapter  2 


Compression  of  LIDAR  Data  Types 
2.1  Introduction 

The  use  of  Digital  Elevation  Matrices  (DEM)  has  seen  a  rapid  expansion  of 
activity  in  many  diverse  fields  in  the  last  decade.  Among  these  fields  are  geography, 
geology  and  urban  planning.  Interest  in  using  DEM  as  a  tool  spiked  with  hurricane 
Katrina,  since  all  aspects  of  government  and  civilian  intervention  required  extensive 
and  accurate  knowledge  of  surface  details  along  the  gulf  coast.  Given  the  increasing 
prevalence  of  inexpensive  massive  data  storage,  DEM  is  finding  a  strong  following 
in  fields  where  images  used  to  suffice. 

Digital  Elevation  Matrices  are  processed  from  raw  Light  Detection  And  Rang¬ 
ing  (LIDAR)  data.  LIDAR  is  essentially  a  point-cloud  of  three-dimensional  data 
taken  by  attaching  a  light  emitter  and  collccter  to  the  underside  of  an  airplane  and 
flying  the  plane  over  the  region  of  interest.  A  combination  of  GPS  information  for 
the  plane’s  flight  path  and  collected  light  returns  are  converted  into  a  point  cloud 
that  is  a  rough  approximation  of  the  surface  that  has  been  sampled.  Specialty  tech¬ 
nicians  and  software  then  convert  this  raw  data  into  a  gridded  raster  form  that  we 
know  as  DEM.  This  is  a  painstaking  process,  often  involving  human  manipulation 
of  altitude  values  and  positional  shifts  based  on  ground  truth  and  known  artifacts. 
Whenever  DEM  is  mentioned  in  the  following  pages,  it  is  assumed  to  implicitly 
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depend  on  the  underlying  LIDAR  structure. 


Figure  2.1:  An  example  of  raw  LIDAR. 


Note  that  in  figure  2.1,  the  data  is  truly  three  dimensional,  and  non-unique 
height  values  can  be  explained  by  multiple  passes,  multi-path  propagation  or  even 
the  result  of  first  and  second  pass  LIDAR  penetrating  through  different  materials. 
The  process  of  gridding  LIDAR  into  DEM  unfortunately  removes  much  of  the  rich 
structure  that  LIDAR  data  sets  possess.  However,  the  DEM  is  a  more  efficient 
structure,  containing  a  great  deal  of  surface  information  inherited  from  the  LIDAR. 


Figure  2.2:  Two  canonical  examples  of  DEM. 


In  figure  2.2,  and  for  the  remainder  of  this  chapter,  the  z-value,  or  altitude  of  the 
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surface  at  a  pixel,  is  represented  by  a  color  between  blue  (low  altitude)  and  red 
(high  altitude). 

Figure  2.3:  Tilted  urban  DEM  displaying  surface  features. 
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The  fact  that  DEM  is  really  a  gridded  approximation  to  a  surface  allows  us  to 
utilize  the  data  much  differently  than  you  would  an  image.  In  figure  2.3  we  can  rotate 
the  camera  looking  down  onto  the  DEM  from  25°,  changing  the  light  source  and 
shading  to  show  that  the  surface  lives  in  three  dimensions.  This  kind  of  information 
is  vital  for  understanding  line  of  sight  for  mobile  phone  communications  in  a  densely 
populated  city,  or  even  to  understand  large-scale  drainage  paths  in  flood-prone  areas. 

Two  facts  separate  DEM  data  from  traditional  image  data  in  important  ways. 
First,  DEM  can  be  an  excessively  large  way  to  store  three  dimensional  data,  and 
confounds  efficient  methods  of  storage  and  transmission.  It  isn’t  that  DEM  is  an 
obnoxious  data  structure,  but  that  there  is  simply  too  much  of  it,  and  increasingly 
at  higher  and  higher  resolutions.  Only  a  very  small  portion  of  the  globe  has  been 
mapped  via  LIDAR/DEM,  and  even  with  that  supposed  dearth,  there  is  simply  too 


much  data  to  process.  Some  groups  have  been  known  to  deposit  their  DEM  data 
into  large  storehouses  often  many  miles  away  from  the  analysis  center.  In  these 
cases,  in  order  for  data  to  be  processed,  first  they  must  requisition  the  data,  and 
then  have  it  delivered.  A  very  time  consuming  and  tedious  prospect.  Secondly, 
DEM  data  is  very  distinct  from  image  data  in  that  many  techniques  that  derive 
from  image  processing  fail  to  work  correctly  or  are  inefficient  when  used  with  DEM 
data,  treated  as  if  it  were  a  black  and  white  image.  One  very  simple  way  to  visualize 
this  principle  is  to  notice  that  in  an  overhead  image  of  a  building  and  a  parking  lot, 
if  the  two  are  of  the  same  color  and  there  is  no  shadow,  it  may  appear  that  there 
is  in  fact  no  edge  separating  the  two  entities.  In  this  case,  DEM  would  very  clearly 
show  that  an  edge  exists  and  how  extreme  it  is. 


Figure  2.4:  Can  you  tell  if  there  is  an  extreme  edge  on  this  roof? 


One  solution  to  the  problem  of  large  data  sizes  for  DEM  is  through  some 
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method  of  compression.  This  is  where  the  second  fact  emerges  to  confound  the  first: 
traditional  image  processing  compression  techniques  do  not  preserve  the  important 
geometries  in  DEM  that  make  them  useful  to  an  analyst.  In  a  heavily  urban  DEM, 
and  this  will  be  the  primary  focus  of  this  chapter,  the  data  is  essentially  composed 
of  many  flat  areas  hemmed  in  by  heavy  edge  detail.  Many  traditional  wavelet 
compression  techniques  compromise  the  edges  and  add  Gibbs-like  noise  where  there 
was  none.  For  many  technicians  that  work  closely  with  DEM,  this  is  an  unacceptable 
loss  of  integrity.  The  edges  are  what  makes  the  DEM  special! 

A  straightforward  approach  to  the  solution  of  these  problems  is  by  using  mod¬ 
ified  wavelet-like  systems  that  are  amenable  to  edges  and  directional  details.  Specif¬ 
ically,  curvelets,  contourlets  and  ridgelcts  [23]  [24]  [25]  [13]  are  relatively  modern 
multi-scale  methods  that  emphasize  intelligent  tiling  of  the  frequency  domain  and 
are  thus  directionally  oriented  wavelet-like  systems  that  can  be  applied  to  the  rep¬ 
resentation  and  compression  of  DEM  data.  The  Erst  portion  of  this  chapter  will 
detail  experiments  that  conclude  the  superiority  of  contourlets  over  other  standard 
and  modified  methods  in  representing  urban  DEM.  The  idea  of  what  it  means  for 
one  method  to  be  superior  over  another  will  also  be  discussed  since  we  have  estab¬ 
lished  that  DEM  and  image  data  are  distinct.  Given  this,  it  will  be  useful  to  evalu¬ 
ate  our  quantitative  methods  of  establishing  how  closely  a  represented/compressed 
DEM  resembles  the  original  by  going  through  various  quality  indices  and  metrics 
that  attempt  to  accomplish  this  goal.  We  eventually  settle  on  the  Terrain  Struc¬ 
tural  Similarity  index  (TSSIM),  a  modified  version  of  the  structural  similarity  index 
(SSIM)  [51]  and  use  this  as  a  quality  comparison  between  any  two  DEM  tiles. 
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Since  geometry  plays  a  vital  role  in  the  underlying  structure  of  urban  DEM,  it 
makes  sense  that  techniques  based  on  scale-space  geometric  representation  may  per¬ 
form  better  than  multi-scale  methods.  The  second  section  of  this  chapter  deals  with 
studying  and  modifying  wedgelets  [26] ,  a  geomtrical  representation  method,  to  work 
specifically  with  urban  DEM.  Wedgelets  end  up  vastly  outperforming  contourlets  in 
the  representation,  compression  and  quality  reconstruction  of  heavily  urban  DEM 
data  tiles.  We  will  introduce  several  modifications  to  the  basic  wedgelet  scheme 
that  enhance  the  ability  of  the  transform  to  optimally  decompose  urban  structures 
in  DEM.  These  increase  the  potency  of  the  compression  while  maintaining  strong 
reconstruction  in  terms  of  TSSIM. 

2.2  DEM  and  LIDAR  Data  Types 

Digital  Elevation  Models/Matrices  are  real-valued  matrices  for  which  the  rows 
and  columns  represent  some  geographic  latitude  and  longitude  and  the  value  of  the 
pixel  corresponds  to  an  approximation  of  the  surface  altitude  at  that  position.  Most 
users  of  this  data  type  either  pay  to  have  raw  LIDAR  data  converted  to  its  more 
usable  form,  or  have  in-house  processes  and  technicians  that  do  the  same.  Either 
way,  raw  LIDAR  point-clouds  are  the  underlying  data  type  that  give  DEM  their 
important  structure. 

DEM  has  a  few  very  distinct  uses.  Some  very  coarse  forms  (where  the  grid 
points  are  separated  by  50  meters  or  more)  are  used  to  map  vast  regions,  moun¬ 
tainous  and  sub-sea  level  alike.  These  can  be  used  to  predict  ice/water  flow  and 
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collection  since  they  contain  information  about  the  gradient  of  elevation.  Another 
use  for  such  regions  is  the  analysis  of  straight  line  paths  connecting  two  entities  on 
separate  parts  of  the  surface.  This  can  be  helpful  in  planning  systems  of  communi¬ 
cation  that  rely  on  the  propagation  of  EM  signals.  Finer  versions  of  DEM  (where 
the  grid  points  are  separated  by  1  meter  or  less)  are  very  useful  for  analyzing  various 
aspects  of  urban  importance.  Cell-phone  tower  location,  line-of-sight  analysis  and 
noise  reduction  are  a  few  such  uses  of  the  fine-scale  urban  DEM. 

This  research  project  dealt  with  a  few  specific  data  sets  that  will  arise  again 
and  again  throughout  this  chapter.  We  have  what  is  known  as  the  S  data  set,  a 
large  portion  of  New  Orleans  (note  the  colors  with  respect  to  sea  level),  which  we 
have  carved  into  a  number  of  manageable  512  x  512  sized  data  tiles.  We  refer  to 
these  as  S\  through  Si5,  with  the  standard  urban  example  being  Si0. 


Figure  2.5:  New  Orleans  S-tile  and  the  sub-tiles  Si  through  £15. 


The  second  important  data  set  that  is  used  throughout  this  chapter  is  the 
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so-called  T-tilc.  This  is  a  large  portion  of  Alexandria  Virginia,  with  Telegraph  road 
running  right  through  the  middle.  This  tile  happened  to  contain  the  headquarters  of 
the  Topographic  Engineering  Center  (TEC).  This  is  the  organization  that  financed 
this  research  project.  Clearly  an  important  tile.  Similar  to  S,  this  is  a  very  large 
high  resolution  DEM,  that  we  broke  up  into  a  number  of  manageable  512  x  512 
sub-tiles.  We  also  refer  to  these  as  T\  through  T25. 

The  T  tile  has  a  lot  going  for  it  aside  from  its  artful  color  scheme.  This 
represents  an  area  roughly  six  thousand  meters  square,  with  1  meter  resolution.  In 
the  left  most  image  in  figure  2.7,  there  is  a  large  shaded  region  that  actually  sits 
above  the  TEC  headquarters.  This  is  one  of  the  only  regions  for  which  we  had  both 
LIDAR  and  DEM  data  representing  the  same  geographic  region.  The  shaded  region 


Figure  2.6:  The  standard  urban  tile,  S\q. 
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Figure  2.7:  Alexandria  Virginia  T-tile  and  the  sub-tiles  7\  through  T2 5. 


represents  the  boundary  of  the  LIDAR  data  that  we  received,  and  this  fortunate 
concurrence  allowed  us  to  try  and  understand  the  gridding  process  in  more  depth. 
The  amount  data  contained  within  the  point  cloud  is  very  large,  containing  roughly 
4.9  million  three  dimensional  points.  The  density  of  the  LIDAR  point  cloud  is 
similar  to  the  uniform  density  of  the  DEM.  The  point  cloud  comprises  about  13.6 
percent  of  the  points  in  the  entire  large  T  tile. 

Since  LIDAR  comes  before  DEM  in  the  processing  chain,  it  is  likely  that  the 
LIDAR  retains  more  intrinsic  geometric  information  from  the  scene.  Because  of  this 
reason,  it  seems  prudent  to  try  and  gain  an  automatic  advantage  by  working  directly 
with  the  LIDAR  and  avoiding  the  loss  of  information  that  occurs  from  gridding  it. 
This  seemed  like  a  very  good  idea,  until  it  became  clear  that  the  process  was  both 
intensive,  proprietary  and  not  reproducible  without  other  resources.  The  process 
of  gridding  LIDAR  is  for  all  intents  and  purposes,  a  black-box.  The  following  is  a 
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Figure  2.8:  This  Map  Shows  the  Density  of  the  Point  Cloud  Samples. 


brief  summary  of  our  attempts  at  studying  the  gridding  process,  how  sensitive  our 
techniques  were  to  the  gridding  mechanism,  and  ultimately  how  the  conversion  to 
DEM  is  quite  opaque.  For  this  we  create  a  26’th  T  tile,  and  pull  it  right  from  the 
region  above  the  TEC  headquarters. 

Note  that  our  gridded  versions  of  T2q  are  not  nearly  as  exciting  as  the  DEM 
version.  We  speculate  that  the  data  was  collected  at  different  times  in  the  year, 
accounting  for  several  foliage  differences.  Furthermore,  trees  can  be  a  tricky  issue  to 
accurately  model  with  L1DAR,  as  the  first-pass  does  not  often  penetrate  the  canopy 
of  the  trees,  while  the  second-pass  can.  This  can  create  a  multi-valued  surface  (which 
it  is)  near  trees,  and  is  often  corrected  by  hand.  Similar  issues  arise  over  water.  We 
specifically  repeated  this  experiment  with  another  tile,  now  T27,  trying  to  see  the 
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Figure  2.9:  Standard  Gridding  Techniques  on  the  LIDAR  Contained  in  T2q. 
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effects  of  water  on  LIDAR. 


2.2.1  Urban  DEM,  Quality  and  the  Expert 

It  is  always  an  interesting  endeavor  to  apply  mathematics  to  questions  of 
subjectivity.  How  can  you  quantify  attractiveness?  Is  Monet  e  more  creative  than 
Manet?  This  is  embellishing  for  the  actual  question  of  how  do  you  tailor  quality 
results  to  your  benefactors  when  the  paragon  of  excellence  is  to  please  an  expert? 
In  this  project,  our  indicated  goal  was  to  represent,  compress  and  reconstruct  urban 
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LIDAR  T27-Mean  LIDAR  T27-Min 


DEM  in  such  a  way  that  a  trained  expert  would  consider  the  image  representation  to 
be  reasonable.  This  kind  of  tricky  response  to  the  question,  ‘what  is  quality?,’  forced 
us  to  change  our  perspective  on  how  to  measure  visual  quality  in  a  quantitative  way. 
There  are  various  schools  of  thought  on  this  issue,  and  the  norms  that  first  get  used 
to  verify  the  quality  of  an  image  are  the  ll  norm  or  the  total- variation  norm.  It 
has  been  suggested  that  there  are  physiological  explanations  for  why  these  norms 
generally  behave  more  like  the  eye’s  response. 

Following  this  line  of  reasoning,  we  wished  to  find  a  measure  of  quality,  that 
reflected  the  human  eye’s  ability  to  distinguish  structures  and  differentiate  similar 
images  based  on  changes  that  mean-squared-error,  i1  and  total-variation  do  not 
always  pick  up  on.  If  we  can  measure  images  against  each  other,  especially  in  a 
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way  that  is  amenable  to  the  human  eye,  then  we  are  one  step  closer  to  pleasing  the 
experts.  Since  we  are  not  working  with  images,  but  scenes  in  which  structure  and 
detail  are  the  prominent  features,  we  can  narrow  our  scope  to  finding  a  measure  that 
approximates  the  human  eye’s  ability  to  distinguish  minute  structural  differences  in 
similar  scenes. 

We  will  restrict  the  held  of  applicable  DEM  data  by  insisting  that  we  only 
are  concerned  with  urban  DEM.  This  has  its  own  inherent  problems  with  lack  of  a 
rigorous  definition.  Are  the  suburbs  urban?  Is  the  Guggenheim  a  structure  that  we 
think  of  as  being  in  an  urban  environment?  This  issue  was  avoided  ahead  of  time  by 
quizzing  the  experts  and  asking  them  to  sort  a  collection  of  tiles  into  urban,  rural 
and  suburban.  In  general,  we  will  just  think  of  the  distinction  as  man-made  forms 
versus  natural  scenery.  For  us,  urban  DEM  contains  heavy  edge  discontinuities 
that  usually  lie  along  polygons.  Natural  terrain  tends  to  vary  continuously  along 
irregular  boundaries. 

2.2.2  Terrain  Structural  Similarity  Index  (TSSIM) 

The  problem  of  automatically  measuring  image  quality  in  a  way  that  agrees 
with  human  inspection  is  not  new.  In  1995,  Eskicioglu  and  Fisher  [30]  studied 
a  massive  collection  of  common  image  quality  assessment  methods  ( ip ,  MSE,  £°° 
and  many  more),  degraded  a  collection  of  black  and  white  images,  and  then  had 
human  subjects  rank  degraded  images  against  the  original.  They  then  took  the 
measurements  from  the  collection  of  quality  measures  and  looked  at  the  correlation 
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between  human  and  the  measures  concept  of  quality. 

Their  findings  were  fairly  inconclusive,  showing  some  of  the  measures  to  be 
essentially  worthless  for  predicting  a  human’s  response  to  the  quality,  but  they  did 
make  an  important  point.  It  is  not  difficult  to  create  an  example  of  a  slightly 
degraded  image,  that  is  quite  strikingly  bad  to  the  human  eye,  for  which  the  MSE 
is  quite  low.  It  really  shouldn’t  be  that  easy  to  fool  a  standard  measure  of  image 
quality.  Really  what  needs  to  be  understood  is  how  the  eye  perceives  changes  and 
to  model  that  more  efficiently. 

In  2004,  Wang,  Bovik,  Sheikh  and  Simoncelli  [51]  introduced  a  new  quality 
index,  known  as  SSIM,  and  with  it  improved  upon  this  type  of  comparison  between 
automatic  image  quality  assessment  as  compared  to  human  subjective  tests.  With 
this  experiment,  they  have  shown  their  index,  SSIM,  to  be  far  and  away  better  at 
predicting  human  quality  judgements,  over  peak  signal  to  noise  ratio  (PSNR),  which 
is  related  to  MSE,  Sarnoff  and  UQI  methods. 

Part  of  their  approach  is  to  really  pay  attention  to  the  underlying  physiology 
of  the  human  eye,  and  to  model  their  system  in  a  similar  manner.  They  remark 
that  the  perceivability  of  image  details  depends  on  the  sampling  density  of  the 
image  signal,  the  distance  from  the  image  plane  to  the  observer,  and  the  perceptual 
capability  of  the  observers  visual  system.  In  practice,  the  subjective  evaluation  of  a 
given  image  varies  when  these  factors  vary. 

They  first  act  to  separate  the  following  three  layers  from  an  image:  luminance, 
contrast  and  structure.  These  will  be  operated  on  independently. 

The  SSIM  is  a  statistical  measure.  It  is  computed  based  on  the  moments  of 
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the  pixel  value  distribution.  Fix  a  window  size  N.  Let  a  =  {a*}^  and  b  =  {bi}^=l 
be  the  pixels  corresponding  to  this  window,  centered  on  the  target  pixels  in  the  two 
images,  respectively.  We  first  compute  the  weighted  mean  for  each  window: 

N  N 

fjja  ^  ^  w^cti  and  ^  ^  Wibi. 

i—  1  i= 1 

where  the  positive  weights  w  sum  up  to  1,  thus  providing  a  weighted  average. 
The  case  when  all  the  w  are  equal  corresponds  to  the  statistical  mean  estimator, 
but  inverse  distance  weights  can  also  be  used.  We  performed  our  experiments  with 
an  11  x  11  Gaussian  window. 

The  luminance  comparison  between  the  two  target  pixels  is  given  by: 


/(a,  b) 


Wa  +  ti) 


This  index  ranges  from  0  to  1,  with  1  being  the  case  for  a=b. 
Next,  the  variances  (second  moments)  are  computed: 


N 

2  \  ^  2  2  i  2 

aa  =  2_s  Wiai  ~  da  and  ab 

i— 1 

and  the  second  component  is  defined  to  be 


N 

Y  Wibi  -  $ 

i— 1 


c(a,  b)  = 


Wa  + W. 

The  third  component  is  based  on  the  crosscorrelation: 


N 

&a,b  ^  l  Widibi  l^a/dh 
i=  1 
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and  defined  as: 


s(a,  b ) 


(  &a,b 
\0'a@b 


Combining  the  three  components  multiplicatively,  we  obtain: 


SSIM(a,  b ) 


f  2/ia/ib  +  Ci  \  f  2a aab  +  C2  \  /  <7ab  +  C3  \ 

\ri  +  7b TC~J  [al  +  al  +  CiJ  \aaab  +  C3J 


where  the  constants  Ci  are  included  for  numerical  stability  in  the  computation. 
A  MATLAB  implementation  of  the  SSIM  index  is  available  at 


http://www.cns.nyu.edu/  zwang/hles/research/ssim/  . 


For  the  purposes  of  studying  terrain  elevation  data,  we  chose  to  discard  the  lumi¬ 
nance  component,  considering  that  absolute  mean  elevation  differences  play  little 
role  in  evaluating  the  similarity  of  urban  DEMs.  We  thus  defined  the  Terrain  Struc¬ 
tural  similarity  index  as: 


TSSIM  (a,  b) 


f  2(Ja(Jb  T  C2  \  /  Vab  +  Cz  \ 

Wa  +  a2b+C2J  \aaab  +  C3)  ' 


With  the  appropriate  choice  of  constants,  this  expression  simplifies  to: 


TSSIM  (a,  b) 


(7ab  +  C 


It  should  be  noted  that  while  TSSIM  is  a  remarkable  tool  for  detecting  visual 


differences  in  DEM,  it  is  extremely  sensitive  to  translation,  rotation  and  scaling.  It 
would  clearly  not  be  a  strong  choice  if  we  were  comparing  to  sets  of  data  in  which 
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the  movement  of  a  sensor  provided  error  within  one  of  these  fields.  However,  this  is 
not  the  case  for  us,  and  in  fact,  since  DEM  has  so  much  more  underlying  structure 
than  do  black  and  white  images,  we  would  be  hard  pressed  to  find  a  better  measure 
of  quality  that  conforms  to  the  experts  as  well.  Note  that  TSSIM  is  not  a  metric, 
and  two  images  I\  and  J2  are  equal  if  and  only  if  both  have  the  same  size  and 
TSSIM(Ii,  I2)  =  1.  Furthermore,  note  that  TSSIM(a,b)  is  actually  a  matrix  of 
local  TSSIM  values.  When  we  talk  about  the  value  of  TSSIM  between  two  images, 
we  are  referring  to  the  average  across  this  entire  TSSIM  map. 

2.3  Time-Scale  Compression  Methods 

Computer  vision  and  image  processing  have  long  known  that  edges  are  the 
predominant  feature  in  images:  they  govern  the  structure  and  are  boundaries  for 
remaining  smooth  portions.  The  multi-scale  property  of  wavelets  have  brought  new 
tools  into  the  realm  of  dealing  with  edges  within  images,  but  most  often  these  tools 
are  clunky  and  difficult  to  worth  with  for  the  kinds  of  edges  typically  found  in 
images.  In  [43]  the  authors  described  the  deficiency  of  standard  wavelet  technique 
to  model  the  edges  within  images  as  a  problem  with  the  underlying  description  of 
these  edges.  The  issue  apparently  is  due  to  the  behavior  of  Besov  models  with 
relation  to  so-called  cartoon-like  models.  These  cartoon-like  models  showed  how 
sub-optimally  standard  wavelet  techniques  dealt  with  these  edges.  The  standard 
cartoon  model,  as  seen  below,  is  composed  of  two  distinct  constant  valued  regions 
separated  by  a  curve  that  is  C*2,  and  is  more  generally  in  the  literature,  piece-wise 
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To  combat  these  problems,  several  new  techniques  were  established  to  more 
optimally  decompose  the  cartoon-like  model  of  edges  within  images.  One  way  to 
think  about  this  optimal  decomposition  is  in  expressing  an  edge  along  a  nice  discon¬ 
tinuity  in  some  manner  that  is  intrinsic  to  the  image,  and  therefore  requires  very  few 
coefficients  to  store.  Vetterli  and  Do  have  a  fantastic  image  [23]  which  expresses 
this  well. 

In  2004  Candes  and  Donoho  introduced  curvelets  [13].  This  was  the  first  at¬ 
tempt  at  sparsely  approximating  the  cartoon  model,  and  doing  so  in  an  optimal 
manner.  Their  idea  is  novel  and  powerful,  and  has  lead  to  a  variety  of  important 
offshoot  results.  Contourlcts  were  introduced  by  Vetterli  and  Do  [23]  [24]  in  2005 
and  sought  to  perform  a  similar  function  as  curvelets,  with  a  new  partition  of  the  fre¬ 
quency  domain,  directional  filter-bank  analysis  and  user-friendly  discrete  framework 
for  an  algorithmic  implementation. 
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Figure  2.12:  Modeling  an  Edge  in  a  Cartoon  Model:  Do  and  Vetterli. 


Figure  2.13:  Frequency  partition  for  Contourlets. 


Ridgelets  were  a  precursor  to  Contourlets,  and  was  created  by  Vetterli  and  Do 
in  2003  [25]  to  enhance  the  ability  of  wavelet  systems  to  be  directionally  sensitive. 
Many  of  the  features  of  ridgelets  become  an  important  part  of  the  wedgelet  repre¬ 
sentation  scheme,  which  will  be  detailed  later  in  this  chapter.  In  2005,  Labate,  Lim, 
Kutyniok  and  Weiss  [44]  furthered  each  of  these  methods  by  creating  the  shearlet 
transform.  Unfortunately  this  transform  was  not  included  in  the  analysis  of  DEM 
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data  types,  as  the  transform  was  still  quite  new  at  the  time  of  this  projects  incep¬ 
tion.  It  would  be  interesting  to  measure  how  well  that  transform  would  perform 
against  the  others  gathered  already. 

This  short  section  shall  compare  a  few  standard  wavelet  compression  methods 
to  those  new  methods  which  attempt  to  encode  edge  data  in  an  optimal  way  for 
the  cartoon  model,  and  will  identify  the  method  best  able  to  represent,  compress 
and  accurately  reconstruct  urban  DEM  data  types.  To  provide  a  baseline  level  of 
comparison  to  more  traditional  image  compression  techniques,  we  have  included  the 
discrete  cosine  transform  (DCT)  [54]  and  a  couple  of  standard  (5/7  and  9/7)  wavelet 
filter  banks  [46]  into  our  list  of  techniques. 

The  experiment  consists  of  the  following  steps: 

1.  Select  a  DEM  dataset  /,  (Si  —  5j5,Ti  —  T25), 

2.  Decompose  the  DEM  data  set  /  into 

-  5/3  wavelet  hlter-bank, 

-  9/7  wavelet  hlter-bank, 

-  the  discrete  cosine  transform, 

-  the  curvelet  transform, 

-  the  contourlcts  transform  (1  level), 

-  the  contourlcts  transform  (2  levels), 

-  the  contourlcts  transform  (3  levels), 

-  the  ridgelet  transform. 
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3.  Stepping  p  back  from  100%  to  0%,  threshold  the  representations  from  2.  re¬ 
taining  p  percent  of  the  coefficients.  Reconstruct  each  threshed  representation 
and  save  the  reconstructed  image,  Ip, 

4.  Generate  curves  comparing  the  original  DEM  data  set  /  to  each  of  the  recon¬ 
structed  images  varying  by  p  percentage,  Ip,  using  the  following  metrics  and 
quality  index, 

-  !og(l  + 1|  /  -  411,2), 

-  10g(l  +  ||/  —  411, oo), 

-  ||J  —  Ip\\tv,  where  tv  is  the  total-variation  norm, 

-  TSSIM(I,Ip). 

We  have  used  this  experiment  flow-chart  with  three  sets  of  data.  First  we 
have  processed  the  fifteen  S  tiles  using  the  above  steps.  For  each  norm-type,  we 
have  averaged  across  the  fifteen  S  tiles,  creating  four  graphs,  one  for  each  of  the 
used  norms/quality  index.  Each  curve  represents  a  single  multi-scale  decomposition 
method,  with  the  results  being  averaged  across  each  of  the  S  tiles. 

Similarly,  the  T  tiles  have  been  processed  using  the  experiment  guidelines  and 
have  been  averaged  across  the  25  different  T  tiles. 

The  next  group  of  results  deals  solely  with  the  tile  S\q.  Instead  of  thresholding 
from  100  percent  to  0,  we  have  only  run  through  the  important  visual  range  of  0  to 
30  percent  of  retained  coefficients. 

These  metrics/quality  index  exhibit  very  different  behavior  as  the  DEM  data 
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%  Retained  Coefficients 


Figure  2.14:  S-tiles  Avg.:  log(l  + 


Figure  2.15:  S-tiles  Avg.:  log(l  + 


Figure  2.16:  S-tiles  Avg.: 


total  variation 


Figure  2.17:  S-tiles  Avg.:  TSSIM 


T  Tile  Averages 


%  Retained  Coefficients 


T  Tile  Averages 


Figure  2.18:  T-tiles  Avg.:  log(l  +  j|  ||2)  Figure  2.19:  T-tiles  Avg.:  log(l  + 
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T  Tile  Averages 


T  Tile  Averages 


Figure  2.20:  T-tiles  Avg.: 


total  variation 


Figure  2.21:  T-tiles  Avg.:  TSSIM 


Figure  2.22:  Sw:  log(l  +  ||  ||2) 


Figure  2.23:  S10:  log(l  +  ||  Hoc) 


Figure 


2.24:  >S'i o :  total  variation 


Figure  2.25:  S10:  TSSIM 


sets  are  degraded  more  and  more.  It  became  clear  during  this  experiment  that  the 
TSSIM  index  became  a  more  reliable  measurement  device  as  compared  to  the  ex¬ 
pert  opinions  at  the  Topographic  Engineering  Center.  In  fact,  we  established  that 
a  TSSIM  of  0.8  between  two  DEM  tiles  (especially  when  those  tiles  were  urban) 
represented  a  ‘good-enough  representation’  for  the  experts.  Interestingly  enough, 
the  Contourlets  level  3  decomposition  does  essentially  as  well  as  the  other  repre¬ 
sentations  until  approximately  20  percent  retained  coefficients.  At  this  point  the 
Contourlets  level  3  method  jumps  far  ahead  of  the  other  methods,  for  the  combined 
T  and  S  tile  averages.  Note  that  around  this  20  percent  jump,  the  TSSIM  values 
hover  at  or  above  0.8.  The  blow-up  of  this  process  for  S\q  help  to  illustrate  that 
for  urban  DEM  tiles,  this  effect  is  even  sharper.  Given  that  TSSIM  is  the  index 
for  which  we  want  to  optimize,  it  became  clear  that  contourlets  were  the  optimal 
representation  to  represent  urban  DEM. 

This  experiment  used  brute  force  to  come  to  a  quick  solution  of  how  to  repre¬ 
sent  urban  DEM  in  such  a  way  that  the  experts  find  the  representation  satisfactory. 
We  have  avoided  a  more  intricate  study  of  these  methods  and  how  various  threshold¬ 
ing  techniques,  or  even  small  changes  in  underlying  data  may  effect  the  results.  The 
most  important  information  that  comes  from  this  data,  are  the  baseline  results  that 
we  can  use  to  improve  on  the  technique.  The  next  section  of  this  chapter  will  deal 
with  a  multi-scale  geometric  representation  which  exceeds  the  ability  of  Contourlets 
to  represent  DEM  data  types. 
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2.4  Geometric  Representation:  Wedgelets 


Wavelets  have  certainly  enjoyed  a  strong  following  in  the  past  25  years,  but 
they  are  not  always  the  most  useful  or  intuitive  technique  for  certain  problems.  One 
of  the  characteristics  of  urban  DEM  is  the  proliferation  of  heavy  edges  throughout 
the  scene.  While  directional  wavelet  methods  help  to  capture  these  edges,  they  still 
add  undesirable  noise,  by  adding  Gibbs-like  ringing. 


Figure  2.26:  <Si0.  Figure  2.27:  Contourlets  defects. 

This  ringing  acts  to  smooth  out  the  gradient,  rendering  much  of  the  informa¬ 
tion  from  the  DEM  useless. 

One  way  to  deal  with  this  phenomenon  is  to  think  more  intuitively  about  how 
we  want  to  represent  this  type  of  data.  For  instance,  ask  what  kind  of  shapes  we 
are  seeing  primarily,  how  much  elevation  change  do  we  expect,  and  what  are  typical 
neighborhoods  in  the  data  resembling?  By  asking  these  specific  questions  we  can 
note  that  urban  DEM  contains  edges,  corners  and  very  few  curves.  Unless  you  are 
analyzing  the  Gugenhcim  museum,  urban  DEM  contains  flat  or  constant  sloped 
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rooves.  Furthermore,  we  typically  don’t  find  an  abundance  of  natural  shapes  like 
trees,  or  lakes  in  urban  DEM.  This  means  that  we  are  interested  in  representing 
simple  flat  and  constant  sloped  objects  that  are  well  surrounded  by  heavy  edge  and 
corner  detail. 

Figure  2.28:  Aligned  edges,  polygonal  edges  and  flat  surfaces. 


This  image,  figure  2.28  is  as  urban  as  DEM  tiles  get.  The  buildings  have  flat 
or  slightly  sloped  rooves,  the  buildings  align  on  a  grid  that  appears  to  be  shifted 
approximately  20  degrees.  There  are  very  few  details  (circled)  to  stand  in  the  way 
of  large  simple  objects  to  represent. 

2.4.1  Wedgelets 

In  [26]  Donoho  introduced  a  multi-scale  representation  scheme  that  essentially 
extracts  geometric  edge  data  from  an  image  and  encodes  it  efficiently  into  piece-wise 
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constant  functions  that  are  supported  on  a  dyadic  grid  of  wedges. 


Figure  2.29:  Examples  of  Wedgelet  Functions. 


This  construction  is  highly  amenable  to  our  urban  DEM  data  because  of  the 
prevalence  of  edges  and  piecewise  constant  regions  in  that  data. 

His  aims  are  to  optimally  represent  the  cartoon  model  sparsely,  naturally  and 
in  some  type  of  min-max  sense.  We  wish  to  use  this  method  because  intuitively 
buildings,  roads  and  similar  objects  share  many  geometric  properties  with  the  car¬ 
toon  model  (except  for  the  C 2  along  the  boundaries  of  discontinuities).  Rectangular 
buildings  with  flat  rooves  can  be  perfectly  represented  by  such  functions  supported 
on  wedges.  Collapsing  a  building  down  to  such  a  simple  function  allows  a  great  deal 
of  room  for  compression. 

We  shall  adapt  and  improve  on  Donoho’s  system  in  three  ways: 

1.  We  use  a  recursive  scheme  to  compute  the  wedgelet  transform  in  a  faster  way, 

2.  We  remove  the  constraint  that  our  representation  functions  be  piece-wise  con¬ 
stant,  allowing  for  our  surfaces  to  be  planar, 

3.  We  generalize  the  notion  of  wedgelets  to  the  underlying  data  type,  LIDAR 
point  clouds. 
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2.4.2  Piece-Wise  Constant  Wedgelets 


To  fix  ideas  and  simplify  our  notation,  we  will  describe  the  wedgelet  algorithm 
using  a  dyadic  square  image,  where  I  €  Mnxn(R)  is  an  N  x  N  image,  where  N  =  2k 
for  some  k. 

Figure  2.30:  The  Basic  Wedgelet  Set-Up. 
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Next  we  decompose  the  image  /  into  a  dyadic  multi-scale  tree,  {S'/}y=o,...,fc,  i= o,...,2J+i) 
where  j  represents  the  depth  of  the  decomposition  and  l  keeps  track  of  the  +  1 
sub-regions  at  that  depth.  So  at  each  depth  j,  there  is  a  partition  of  I  =  1  S'/. 

Figure  2.31:  First  Layer  Dyadic  Decomposition. 


The  ordering  structure  begins  at  depth  1  as,  from  left  to  right,  and  top  to 
bottom,  and  then  for  each  of  the  following  depths,  the  ordering  is  done  recursively. 

Note  that  Sj  is  an  x  |j  sub-image  of  /.  This  decomposition  continues  down 
to  pixel-scale,  i.e.  j  =  log 2(1V). 
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Figure  2.32:  Illustration  of  ordering  in  Layer  2. 
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The  next  stage  of  the  wedgelet  decomposition  of  /  involves  bisecting  each  sub- 
region  Sj  by  a  collection  of  line  segments  and  creating  a  pair  of  wedges  supported  on 
Sj .  Donoho  does  this  by  assigning  any  two  non- redundant  pixels  on  the  boundary 
of  si  as  being  end-points  of  the  edgelet  Ej(x i,x2),  where  x\,x2  <G  dSj.  Edgelets 
are  directed  from  x\  to  x2  as  a  mechanism  to  keep  different  wedges  defined  by  the 
same  edgelet  separate. 

Figure  2.33:  An  Edgelet  on  the  sub-square  Sj. 
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In  practice,  parameterizing  the  bisection  in  this  way  can  be  troublesome  com- 
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putationally.  Instead  we  opt  to  fix  a  finite  set  of  angles,  0  C  [— |,f),  and  by  a  finite 
collection  of  offsets  k  £  Z.  Since  compression  is  a  goal,  it  is  important  to  see  that 
parameterizing  onr  edgelets  in  this  manner  does  not  cost  any  extra  storage,  since 
6  can  be  chosen  from  a  finite  collection  of  pre-rendered  angles  and  k  is  naturally 
integral.  This  ordered  pair  ( 9 ,  k )  is  then  a  collection  of  two  integers  which  is  pre¬ 
cisely  the  same  amount  of  storage  needed  to  describe  edgelets  in  Donoho’s  scheme. 
This  choice  of  parametrization  does  not  come  without  cost,  and  accordingly  we  must 
build  some  machinery  to  deal  with  this  change. 

Local  vs.  Global  Coordinates:  The  first  order  of  business  is  to  recognize  the 
distinction  between  local  and  global  coordinates  when  working  with  sub-regions 
from  the  tree  {S'/}.  When  we  are  working  specifically  on  the  region  S'/,  the  local 
coordinates  will  be  taken  from  the  matrix  representing  the  image  on  that  region. 
That  is 

(x1,x2)  e  Z2  n  [0,  ^  —  l]2. 

However,  we  must  remember  that  these  points  exist  globally  within  the  context  of  the 
image  I.  In  fact,  all  lines  and  edgelets  will  be  defined  globally  to  save  computational 
time.  To  remedy  this,  we  introduce  the  coordinate  transform,  &  :  /  — »  u/}q  1S'/  by 


&(x1,x2)  =  {Xi 


x  \  23  N  __  x22j  N 

_~w~ \  2i,X2 ~  [~ir \  % 


Note  that  by  iteratively  applying  this  transform,  we  can  locate  each  l  cor¬ 
responding  to  j  so  that  the  point  ( X\ ,x2)  G  S'/.  This  is  nothing  more  than  an 
application  of  the  division  algorithm.  This  allows  the  coordinate  transform  to  be 
inverted,  by  stepping  up  through  the  depths  iteratively  as  well. 
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Figure  2.34:  An  Edgelet  on  the  sub-square  Sf  and  its  global  coordinates. 


Definition  1.  We  will  borrow  a  useful  definition  of  digital  line  from  [33].  Consider 
an  image  I  that  is  affixed  to  a  coordinate  axis  so  that  the  pixels  line  up  with  an 
integral  mesh  of  R2.  Consider  the  line  y  =  9x  +  k.  We  define  the  digital  line  L(9,  k ) 
to  be 

L(0,i)  =  {(«.,  6)  e  Z2  :  (4  -d)*<  ((a,b),vj-)  <  (k  +  |)<5S} 

where 

Sg  =  max{cos  \9\,  sin  |0|} 

and 

{(—sin  6,cos0)  |  cos#|  >  |  sin#| 

■ 

(sin  6,  —  cos  6)  otherwise 

Definition  2.  Let  6  G  [— and  let  (j,  l )  be  a  valid  set  of  coordinate  for  the  tree 
{S'/}.  We  will  decompose  this  range  of  angles  into  f  regions,  modified  by  steep/flat 
and  standard/ alternative  to 

1.  We  call  9  G  [—  |)  a  flat  angle  and  9  G  [|,  /f)  a  steep  angle, 

2.  We  call  9  G  [— 1,0)  U  [|,  a  standard  increment  set  and  9  G  [0,  |)  an 
alternate  increment  set. 
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Figure  2.35:  Examples  of  steep  and  flat  digital  lines. 


We  allow  6  e  [— and  we  identify  four  distinct  regions  in  this  set. 

Region  1.  [— ^,0)  are  flat  lines  of  standard  increment. 

Region  2.  [0,  |)  are  flat  lines  of  alternate  increment. 

Region  3.  [f ,  f )  are  steep  lines  of  alternate  increment. 

Region  4.  [|,  are  steep  lines  of  standard  increment. 

We  make  these  distinctions  for  a  few  important  reasons  concerning  the  com¬ 
putational  use  of  (9,k).  We  wish  to  make  sure  that  as  we  vary  our  lines  (of  fixed 
angle  6)  we  do  not  sweep  out  redundant  wedges. 

Note  that  a  flat  line  is  just  that;  a  collection  of  pixels  such  that  no  vertical  line 
will  ever  cross  more  than  one  pixel.  Similarly  steep  lines  are  collections  of  contiguous 
pixels  such  that  no  horizontal  line  will  every  cross  more  than  a  single  pixel. 

This  means  that  with  a  fixed  9,  a  steep  line  incremented  horizontally  will  never 
overlap  with  a  previous  or  future  version  of  itself.  A  similar  statement  is  true  of  flat 
lines  incremented  vertically. 


37 


Figure  2.36:  The  four  angular  regions. 
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Figure  2.37:  Correct  Flat  Increment.  Figure  2.38:  Incorrect  Flat  Increment. 

We  do  not  want  to  add  superfluous  calculations,  and  so  we  investigate  the 
following  cases  to  lessen  our  computational  workload. 

This  is  where  standard  and  alternative  increments  come  in  to  play.  For  9  G 
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Figure  2.39:  Correct  Steep  Increment.  Figure  2.40:  Incorrect  Steep  Increment. 

[— 1,0)  (region  1),  L(d, 0)  corresponds  to  the  first  allowable  digital  line  and  L(9,t) 
is  the  final  line.  The  increment  values  for  this  line  would  then  occupy  the  range 
[0, t\  flZ  where  these  values  are  known  to  live  on  the  vertical  axis. 

Figure  2.41:  R1  Increment  Values. 


For  6  G  [0,  |)  (region  2)  we  have  an  alternative  increment,  which  means  to 
allow  intersection  of  the  box,  our  initial  line  must  be  incremented  as  a  negative 
value.  So  the  allowable  increments  for  this  line  type  occupy  the  range  [— t,  |j]  D  7L. 
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Figure  2.42:  R2  Increment  Values. 


Region  three  and  four  have  the  following  increment  ranges. 


Figure  2.43:  R3  Increment  Values.  Figure  2.44:  R4  Increment  Values. 

Having  these  ideas  straightened  out  will  make  a  description  of  the  algorithm 
exceptionally  simple. 

Next  we  define  what  a  wedge  and  a  wedgelet  are.  Suppose  that  6  6  [— 
and  k  is  a  proper  increment  for  that  6.  We  define  the  digital  line  L(6,  k )  and  look 
at  the  following  scenarios. 

We  define  the  main  wedge  to  be  the  region  bounded  by  the  line  L and 
continuing  clockwise  back  to  that  line.  We  call  the  wedge  partitioned  opposite  the 
main  wedge,  the  sister  wedge. 

So  for  any  given  Sj  and  (6,  k )  we  can  uniquely  define  the  main  wedge  and  the 
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Figure  2.45:  R1  Wedgelet. 


Figure  2.46:  R2  Wedgelet. 


Figure  2.47:  R3  Wedgelet.  Figure  2.48:  R4  Wedgelet. 

sister  wedge,  Wm  and  W$-  Once  we  have  obtained  these  wedges,  we  can  define  the 
piece-wise  constant  functions  supported  on  them  as 


These  are  all  of  the  tools  needed  to  perform  the  wedgelet  transform  on  a  dyadic 
squared  image  /.  In  the  recursive  portion  of  the  transform,  we  make  use  of  a  local 
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error  norm.  This  norm  on  its  own  has  been  the  subject  of  much  research,  varying 
from  standards  such  as  i1,  £2  and  total  variation  norms,  to  the  idea  of  creating  a 
local  scaled  version  of  TSSIM  that  could  somehow  act  more  like  a  metric  instead  of 
an  index.  Either  way,  the  norm  used  has  to  actually  satisfy  the  triangle  inequality  to 
allow  for  the  pruning  of  the  full  coefficient  tree  to  a  sparse  reconstruction  partition. 
In  practice,  it  is  useful  to  think  of  this  norm  as  being  either  £l  or  £2. 

2.4.3  The  Wedgelet  Transform 

Algorithm  1  Wedgelet  Transform 

for  9i  G  0  do 

for  k  G  possible  increments  for  S{j  do 
WT(Sg) 

end  for 
end  for 

At  the  end  of  the  transform  we  have  a  coefficient  tree,  where  each  node  repre¬ 
sents  an  Si  and  holds  the  value  of  the  9,  k,  VwM,  V\vs  f°r  which  US'/  —  IE/ (9,  k) ||  is 

minimal. 

To  summarize  this  process,  for  each  9  and  k  that  makes  sense  in  this  context, 
and  in  a  recursive  sense,  for  each  Sj  that  intersects  the  digital  line  T(6>,fc),  we  subdi¬ 
vide  the  square  into  a  main  and  sister  wedge,  and  average  the  values  on  these  wedges 
and  form  the  wedgelet  supported  on  them.  We  measure  the  approximation  of  that 
wedgelet  to  the  original  and  if  it  is  better  than  our  previous  ’best  approximation’, 
we  save  the  data  from  this  wedgelet  at  the  Sf  node  of  our  large  data  structure. 
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Algorithm  2  WT(Sf) 

Generate  L(0,  k ). 

Create  Main  and  Sister  Wedges  Wm  and  Ws- 
Create  the  wedgelet  Wj  (0,  k ). 

Compute  the  current  error 

\\s;-wi(e,k)\\  =  E’(e,k). 

If  Ej  ( 9 ,  k )  <  Err}  then  set  the  current  node  as  the  active  node  and  replace 
Err}  =  Ef(e,  k). 

Subdivide  Sj  into  the  four  children  squares  S}+ 1 ,  S}+ 1 ,  S}^~ 1 ,  S}+ 1 . 
for  p  =  1  to  4  do 

IF  S}+ 1  D  L(6,  k)  ^  0  and  the  side  length  of  S}+ 1  is  greater  than  or  equal  to  the 
size  of  a  pixel 

wns;*1) 

end  for 
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For  a  large  image,  this  is  a  very  sizeable  computation.  The  resulting  data 
structure  contains  the  best  wedgelets  on  each  scale,  greatly  reducing  the  amount  of 
information  that  is  needed  to  be  stored.  However,  we  still  have  a  large  amount  of 
redundant  information  stored  in  the  data  structure.  Note  that  for  pretty  much  any 
discontinuity  that  resembles  the  cartoon  model,  there  will  be  some  collection  of  res¬ 
olutions  that  are  able  to  reproduce  that  discontinuity  very  well,  as  long  as  there  are 
few  other  elements  in  range  of  the  edge  and  there  are  no  corners  or  drastic  gradients. 
This  is  the  strength  of  wedgelets  to  resolve  those  cartoon  models  efficiently. 

An  example  of  a  tree  for  the  Si0  data  tile  could  show  the  following  wedge 
partitions  at  varying  scales. 


Figure  2.49:  Level  0  Wedge  Partitions.  Figure  2.50:  Level  1  Wedge  Partitions. 

Definition  3.  We  define  a  valid  reconstruction  partition  of  the  quad-tree  {S^}  to 
be  any  sub- collection  of  these  sets  (nodes  in  the  coefficient  tree)  for  which  each  pixel 
in  the  original  image  is  represented  in  one  square  Sj  once  and  only  once. 

From  our  data  structure  we  can  piece  together  many  valid  partitions  of  the 
original  image  I.  At  coarse  scales  (j  is  small)  we  have  large  wedges  that  represent 
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Figure  2.51:  Level  2  Wedge  Partitions.  Figure  2.52:  Level  3  Wedge  Partitions. 


Figure  2.53:  Level  4  Wedge  Partitions.  Figure  2.54:  Level  5  Wedge  Partitions. 

large  areas,  with  the  benefit  of  high  compression  and  at  the  cost  of  a  loss  of  minute 
details.  On  the  other  hand,  at  high  resolution  we  can  zero  in  on  perfect  recon¬ 
struction,  but  do  so  at  the  cost  of  having  to  store  immense  amounts  of  information 
(Storing  a  group  of  4  pixels  requires  4  real  numbers  to  be  stored,  while  representing 
4  pixels  as  a  wedgelet  requires  2  integers  and  2  real  numbers,  which  is  a  very  low 
savings  as  opposed  to  a  wedgelet  representing  a  much  larger  region). 

From  this  data  structure  we  need  to  construct  a  representation  that  is  optimal, 
in  some  way. 
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Figure  2.55:  Example  of  a  Valid  Partition. 


V  7T  7 


2.4.4  Pruning  the  Quad- Tree 

We  introduce  the  parameter  A  >  0.  This  parameter  will  balance  in  our  recon¬ 
struction,  the  value  of  compression  vs.  precision. 

For  any  valid  full  partition  of  /,  {Sj^}  we  can  define  the  functional 


F(Irec) 


The  goal  is  to  minimize  this  functional  on  the  tree  for  all  full  valid  partitions 
of  I.  Clearly  a  large  A  emphasizes  having  few  pieces  to  represent  /,  while  a  small  A 
tries  to  force  an  exact  approximation.  Since  our  measure  of  two  images  being  close 
is  within  the  context  of  TSSIM  and  not  actually  a  metric,  the  above  functional  is 
useful  only  as  a  guideline  for  the  pruning  procedure.  Our  modified  pruning  method 
will  therefore  not  locate  the  absolute  minimum  of  the  functional  F,  but  will  trade 
that  for  a  local  minimum  that  manages  to  have  a  competitively  large  TSSIM.  In  the 
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future,  making  TSSIM  in  a  more  metric-friendly  manner  will  aid  in  the  creation  of 


an  algorithm  that  could  find  a  global  minimum  of  such  a  functional. 

Localized  Example: 


Figure  2.56:  Local  Pruning  Example. 


? 


Starting  from  the  basic  approximation  of  the  region  of  /  corresponding  to  Sf 
by  {Sh1} P-1,2,3,4  versus  the  approximation  of  that  same  region  by  Sf,  we  can  ask 
the  question,  when  do  we  wish  to  pass  from  the  approximation  on  four  squares,  to 
the  coarser  approximation  on  a  single  larger  square? 


and 


7"“iu  ^>  =  £^+1(<+1-0 

p=  i 


Here  the  values  dj ,  kj ,  Oj^1,  kj^1  correspond  to  the  fixed  values  for  each  coming 
from  the  coefficient  data  structure  at  the  point  specified.  Locally  we  can  then  write 
the  functional  decision  as 


F(/recL)  =  A+  W/iO-l.k-i)-  I  si  =A  +  £/ 


^rec  |  u4  A  •  4  ~h 

v— i  Ip 


i\si 


p= i 


p=  i 
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So  we  decide  to  make  the  change  locally  if 


A  +  Ej  <  4A  +  #(+1 

p=  i 

or  whenever 

Ei  -  E  EC'  < 3A- 

p=i 

We  will  now  apply  this  localized  example  of  small-scale  pruning  and  use  it 
to  generate  a  formal  pruning  procedure.  Instead  of  pruning  the  tree  to  a  full  valid 
partition  by  globally  minimizing  the  functional  F\,  we  do  so  by  starting  out  with  the 
finest  possible  approximation  (namely  /  represented  at  the  pixel  level)  and  consider 
each  grouping  of  4  pixels  to  upgrade  to  a  two  by  two  representation. 

Since  the  approximation  at  pixel  level  is  perfect,  we  have  the  following  criteria 
for  a  group  of  4  pixels  to  be  upgraded  to  a  single  two  by  two  wedgelet  approximation 

Ej  <  3A 

This  algorithm  is  opportunistic  in  that  it  moves  from  clumps  that  have  the 
most  to  gain  from  becoming  a  single  wedgelet,  to  those  that  are  least  likely  to 
benefit. 

The  true  power  of  wedgelets  is  then  a  very  fast  ability  to  sort  through  recon¬ 
structions  of  varying  coarseness  based  on  the  parameter  A.  This  of  course  comes  at 
the  cost  of  a  massive  front-end  computational  overhead. 
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Algorithm  3  Pruning  the  Wedgelet  Tree 

Collect  all  clumps  of  4  pixels  all  belonging  to  the  same  parent  node. 

Collect  the  error  information  from  each  parent  node. 

Set  the  error  information  from  pixel  nodes  to  be  zero. 

ErrorList  The  list  of  errors  of  the  parent  nodes,  as  a  vector. 

ChildErrorList  The  four  child  nodes  error  information.  Here  ChildErrorList(l) 

is  a  vector  with  4  entries. 

ErrorList  •(—  SortDescending(ErrorList).  Sort  ChildErrorList  with  the  same  per¬ 
mutation. 

while  ErrorList(l)  -  SUM(ChildErrorList(l))  <  3A  do 

Flag  the  node  represented  by  position  1  as  current,  turn  its  children  nodes  off. 
Remove  entry  1  from  both  lists. 

if  The  new  entry  itself  has  3  neighbors  of  which  all  share  a  common  parent 
node  which  are  current  nodes  then 

Place  the  error  from  their  common  parent  into  the  list  ErrorList. 

Place  the  four  errors  of  these  neighbors  as  children  in  the  corresponding  place 
in  ChildErrorList. 

ErrorList  SortDescending(ErrorList).  Sort  ChildErrorList  with  the  same 
permutation. 

end  if 
end  while 
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2.4.5  Implementation  and  a  Discrete  Green’s  Theorem 

We  implemented  the  wedgelet  algorithm  in  Matlab  for  testing  purposes.  A 
helpful  speed-up  was  implemented  from  [33],  by  employing  a  discrete  version  of 
Green’s  theorem  for  rapid  moment  calculation.  In  essence,  since  steep  or  flat  lines 
sweep  out  non-redundant  wedges  and  partition  each  of  the  sub-squares  as  the  index 
is  incremented,  the  difference  between  the  moment  at  index  k  and  the  moment  at 
index  k  +  1  was  the  previous  moment  summed  with  the  information  contained  on 
the  incremented  line  L(9,  k  +  1).  This  can  be  made  clearer  with  a  couple  of  simple 
images.  Consider  that  we  already  know  the  wedgelet  on  square  5/  corresponding  to 
the  line  L(9,  k ).  That  is,  we  have 


We  wish  to  use  this  information  to  speed  up  the  calculation  of  the  wedgelet 
corresponding  to  the  incremented  line  L(9,  k  +  1).  The  following  is  an  image  of  the 
wedgelet  at  k  and  the  wedgelet  at  k  +  1. 


Figure  2.57:  Two  Consecutive  Wedgelets. 

Ws  j(k.e)  Ws  ](k+i-0) 


The  digital  line  sweeps  out  unique  and  complete  portions  of  the  square,  and  so 
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with  the  following  relation,  indicating  how  to  convey  information  about  Wj  ( 6 ,  k )  into 
information  about  Wj1  (9,  k  + 1),  we  can  arrive  at  a  formula  which  is  true  independent 
of  k,  as  long  as  the  line  L(9,  k)  and  L(9,  k  +  1)  do  not  sweep  out  the  same  wedgelet 
on  Sf. 


Figure  2.58:  Expression  Using  Wj*(0,  k )  to  Find  Wj*(0,  k  +  1). 


Ws  J(k-0) 


Wg  j(ktl.S) 


1 

That  is,  if  we  write  the  known  information  as 


VMj(0,k)  = 


\Wi 


Mi 


_  cJ  I 

j  I  M  WM 


and 


Vsi(0,k)  = 


T,si\ 


\Wsi\  ^  ~l  IWs 


then  we  can  write  the  updated  moments  simply  as 


(2.2) 


(2.3) 
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(2.4) 


and 


VM{(0,k  +  1) 


wMi(e,k)\vMi(e,k)  +  zsi\mk+1]\ 
wMi{e,k)\  +  \L{e,k  +  i)ns{\  ) 


Wsi(9,k)\Vsl(9,k)-ZSl\mk+1)\ 

WMjt  (9,  k)  \  -  \L(9,  k +  l)n  St  \  ) 


This  leaves  us  with  the  updated  wedgelet 


(2.5) 


W?(9,k  +  1)  —  VMi  (9,  k  +  +  VsJi(9,  k  +  l)l(w^\L(0,fc+i))-  (^-6) 

This  means  that  for  this  particular  square,  when  the  angle  is  fixed,  we  can 
pre-render  all  of  the  moments  that  must  be  calculated,  and  we  can  do  so  with  a 
simple  recursive  formula.  This  is  an  over  simplification  of  how  the  pre-computed 
moments  are  calculated,  as  there  are  variances  with  changes  in  angle  causing  the 
increments  to  change,  and  the  line  to  sweep  in  a  different  direction.  There  is  also 
the  issue  of  how  to  apply  this  to  nested  squares.  That  is  where  the  discrete  Green’s 
theorem  comes  in.  In  practice,  there  are  two  cumulative  matrices  that  are  computed 
for  each  image  /  €  MNxN(] Ft).  These  are  the  vertical  and  horizontal  cumulative 
matrices  Cy  and  Ch  where 


and 


Cv(i,3)  =  '£I(k,3), 

k=  1 


k= 1 
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Note  that  these  matrices  can  easily  be  adjusted  to  any  of  the  sub-squares  Sf 
simply  by  a  little  arithmetic,  for  instance,  with  the  vertical  cumulative  matrix  on 
the  square  Sj  we  have  Cy\  where 

i  k=i 

cvi(i,j)  =  Y,si(k’ri=  E  aWM.WM) 

k=1  k=&tl(l,j) 


k=<£ fc=^,i1(1J')+1 

=  E  E 

k= 1  k= 1 

=  Cv($>I(i,j)J)-Cv($j1(  l,j)  +  l,j). 

Here  :  S?  — >  I  by 

(i,j)  ^  (&n(i,j),&l2(i,j)) 

are  the  global  coordinates  of  (i,j)  G  Sj  within  the  image  I. 

A  similar  calculation  is  true  of  the  horizontal  cumulative  matrix.  All  that  is 
left  to  implement  this  scheme  is  to  calculate  an  angular  moment  matrix  for  each  of 
the  finite  number  of  angles  in  our  resolution.  This  matrix  is  called  Cg,  where 

Ce(i,j)= 

(m,n)eWM  o(0,£) 

where  k  is  the  increment  that  allows  L(6,  k )  to  pass  through  the  point  (i,j).  This  is 
just  the  cumulative  moment  matrix  with  angle  theta  that  can  be  computed  recur¬ 
sively  from  equations  2.4,  2.5  and  2.6. 

With  all  of  these  auxiliary  matrices  defined,  the  task  of  calculating  any  wedgelet 
value  given  a  fixed  angle  6  reduces  to  adding  and  subtracting  values  from  Cy,  Ch 
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and  Co  along  the  boundary  of  the  wedgelet.  Here  is  a  pictorial  example  demonstrat¬ 
ing  how  this  process  is  carried  out.  We  wish  to  calculate  the  moment  along  this 
specific  wedgelet 

Figure  2.59:  An  Unknown  Wedgelet. 


We  then  trace  around  the  boundary  attempting  to  isolate  the  values  of  this  wedgelet. 


Figure  2.60:  Step  2.  Figure  2.61:  Step  3. 

And  finally  we  arrive  at  the  line  integral  around  the  wedgelet. 

The  authors  in  [33]  show  that  these  auxiliary  matrices  speed  the  wedgelet 
transform  process  up  by  approximately  a  factor  of  50-700  times  over  the  popular 
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Figure  2.64:  Step  6.  Figure  2.65:  Step  7. 

Beamlab  implementation  by  Donoho  [27].  Unfortunately,  the  ability  to  take  advan¬ 
tage  of  this  structure  is  lost  when  we  iterate  wedgelets  once  again  and  allow  the 
values  supported  on  the  wedges  to  be  planar. 

2.4.6  Linear  Wedgelets 

Wedgelets  provide  a  very  elegant  way  of  representing  images  that  resemble 
the  cartoon  model.  Since  urban  DEM  data  types  so  closely  resemble  these  cartoon 
models  it  is  no  wonder  that  wedgelets  can  represent  them  with  so  few  coefficients. 
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Figure  2.66:  The  Value  of  the  Wedgelet  is  Calculated. 


There  are  however  a  few  drawbacks  to  the  design  that  can  be  easily  remedied  with 
a  very  simple  observation.  Most  structures  in  urban  DEM  have  typical  roof  config¬ 
urations.  That  is,  most  structures  in  urban  DEM  have  flat  rooves,  rooves  that  have 
a  constant  gradient  or  a  double  gradient  (A-frame).  Piece-wise  constant  wedgelets 
have  a  very  difficult  time  trying  to  efficiently  encode  structures  with  planar  rooves. 

The  problem  is  that  for  a  roof  with  a  constant  gradient,  the  average  value 
of  the  wedgelet  will  always  be  the  midpoint  of  the  wedge  supported  on  the  roof. 
The  error  will  decrease  as  the  resolution  becomes  finer,  which  means  that  all  things 
being  equal,  wedgelets  will  attempt  to  always  reduce  to  finer  resolutions  to  render 
the  roof.  This  creates  a  tiling  effect  on  such  structures,  and  is  very  obvious  to 
the  eye.  Aside  from  being  an  obstacle  to  reconstruction,  it  is  an  inefficient  use  of 
coefficients.  Why  describe  a  single  large  roof  as  a  large  collection  of  small  tiles,  when 
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Figure  2.67:  A  Typical  Urban  Roof. 


Figure  2.68:  Urban  Roof  Approximation  with  Wedgelets. 


you  could  describe  the  roof  in  one  large  low-cost  chunk  of  data  in  a  more  intrinsic 
setting?  Here  we  will  make  a  small  modification  of  the  wedgelet  scheme  that  allows 
us  to  render  these  roof  structures  extremely  well,  at  the  cost  of  a  more  demanding 
data  structure  for  storage  and  reconstruction  and  a  more  computationally  intensive 
wedgelet  transformation. 

The  idea  is  simple,  the  creation  of  the  wedge  structures  remains  unchanged, 
the  only  difference  is  that  instead  of  calculating  the  value  of  the  wedgelet  function  on 
those  wedges  as  being  the  average  value  of  the  image  on  that  wedge,  we  now  apply  a 
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least  squares  fit  to  find  a  plane  that  best  fits  the  wedge.  Note  that  a  single  wedgelet 
on  a  square  Sf  (both  main  and  sister  wedge)  needs  to  store  k,  6  and  the  average  values 
Va i(k,9),Vs(k,9)  to  be  able  to  reconstruct  that  wedgelet.  The  depth  and  location 
of  the  square  Sf  do  not  need  to  be  stored  since  they  are  contained  in  the  position  of 
the  quad-tree  node.  This  means  that  a  single  wedgelet  contains  2  integers  and  2  real 
numbers  for  storage.  Once  we  have  passed  to  these  linear  wedgelets,  each  square  will 
require  the  storage  of  k,  9  as  before,  but  now  the  main  wedge  and  the  sister  wedge 
will  require  three  real  numbers  each,  to  describe  the  plane  that  is  supported  on 
them.  Our  data  storage  has  then  increased  by  4  real  numbers  per  square  that  need 
to  be  stored  for  reconstruction.  This  seems  to  be  a  counterintuitive  change,  since 
our  goal  is  one  of  compression.  But  as  we  shall  see,  the  savings  we  reap  in  being 
able  to  coarsely  represent  structures  with  planar  rooves  more  than  makes  up  for  the 
increase  in  data  storage  per  wedgelet.  There  is  another  small  fact  that  we  need  to 
be  aware  of  when  considering  a  change  that  raises  the  coefficient  count,  especially 
on  the  small  wedgelet  scale.  In  the  constant  wedgelet  case,  there  is  clearly  no  reason 
to  represent  pixels  as  wedgelets,  since  we  are  wasting  energy  trying  to  describe  in 
3  doubles  what  could  be  described  in  1  double.  In  the  constant  wedgelet  case,  the 
two- by-twos  sometimes  benefit  from  being  described  as  a  wedgelet.  If  the  two- by- 
two  were  constant  or  bi-valued,  then  the  wedgelet  can  represent  that  two-by-two  in 
3  doubles  as  opposed  to  4  doubles.  However,  this  all  changes  in  the  linear  wedgelets 
case.  First,  all  two-by-two’s  can  be  perfectly  described  by  linear  wedgelets,  assuming 
nothing  strange  is  going  on  with  the  angular  resolution.  We  must  realize  that  this 
change  has  made  it  always  inefficient  to  describe  two-by-twos  as  a  linear  wedgelet, 


since  both  storing  the  wedgelet  and  the  pixels  are  perfect  representations.  However 
the  linear  wedgelets  cost  7  doubles  per  square,  and  clearly  we  would  be  better  off 
storing  two-by-twos  by  pixel,  if  the  parameter  A  allows  such  fine  resolution. 

Figure  2.69:  Urban  Roof  Approximation  with  Linear  Wedgelets. 


2.4.7  LIDAR  Wedgelets 

We  generalized  the  linear  wedgelet  process  from  raster  images  to  a  LIDAR 
point  cloud  implementation.  We  have  been  dealing  exclusively  so  far  with  the  pro¬ 
cessed  form  of  LIDAR,  DEM,  and  we  have  had  few  opportunities  to  interact  with 
this  complex  data  type.  Part  of  the  apprehension  with  dealing  directly  with  LIDAR 
is  that  the  process  of  converting  it  to  DEM  is  still  a  mystery.  This  makes  it  very 
difficult  to  analyze  the  results  of  compression,  since  we  have  no  baseline  comparison 
in  the  form  of  a  gridded  raster  image.  We  possess  DEM  and  LIDAR  that  physi¬ 
cally  overlap  in  geographic  location,  but  either  do  not  line  up  temporally  or  have 
been  processed  beyond  our  ability  to  replicate  the  data.  Even  with  this  inability  to 
easily  test  our  results,  there  were  a  few  very  strong  reasons  to  implement  wedgelets 
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for  LIDAR.  First,  the  process  of  wedgelets  is  itself  a  gridding  mechanism,  giving 
us  another  mechanism  to  try  and  understand  the  link  between  DEM  and  LIDAR. 
Second,  the  averaging  effects  of  linear  wedgelets  tends  to  smooth  out  large  wedge 
segments  containing  Gaussian  or  salt-and-pepper  noise;  LIDAR  wedgelets  have  real 
noise  suppression  potential  when  A  is  large.  Third,  it  is  reasonable  to  think  of  a 
gridded  DEM  as  a  very  regular  point  cloud,  allowing  us  to  possibly  speed  up  or  im¬ 
prove  our  standard  wedgelet  process  on  urban  DEM.  The  extension  of  wedgelets  to 
LIDAR  is  a  true  generalization  since  the  action  on  a  DEM  by  the  LIDAR  wedgelets 
is  equivalent  to  the  standard  linear  process.  This  may  also  leave  the  door  open 
for  studying  the  effect  of  perturbations  on  the  representation.  The  fourth  reason 
for  implementing  a  point  cloud  version  of  wedgelets  is  because  the  data  structures 
and  calculations  needed  to  create  such  a  program  are  very  well  suited  and  easily 
implemented  in  C++.  Finally,  by  breaking  free  of  the  digital  line,  we  are  sometimes 
able  to  discern  edges  to  a  sub-pixel  resolution. 

Since  comparing  compression  ratios  and  quality  of  reconstruction  is  so  difficult 
when  working  with  LIDAR,  we  will  simply  leave  this  section  as  a  justification  for 
making  wedgelets  work  with  LIDAR.  It  would  be  of  great  benefit  to  have  some  ana¬ 
log  of  TSSIM  that  can  compare  a  raster  version  of  LIDAR  to  the  point  cloud  and 
make  some  kind  of  qualified  assessment  of  quality.  Unfortunately,  since  we  have  been 
relying  on  the  correlation  of  TSSIM  to  the  human  visual  system,  it  seems  unlikely 
that  such  an  easy  fix  will  be  possible.  Another  immediate  benefit  of  the  C++  imple¬ 
mentation  of  LIDAR  wedgelets  is  the  tremendous  speed-up  by  implementing  efficient 
structures  and  using  optimized  linear  algebra  routines.  The  results  that  follow  were 
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almost  all  created  with  the  LIDAR  wedgelets  package,  as  it  is  the  defacto  method 
of  choice  due  to  its  superior  speed  as  compared  to  the  Matlab  implementation. 

2.5  Experiments  and  Performance 

The  following  four  images  (figures  2.70,  2.71,  2.72,  2.73)  are  reconstructions 
of  the  DEM  tile  Sio,  using  linear  wedgelets  with  a  ten  angle  resolution,  an  P  norm 
for  local  wedgelet  selection  and  varying  the  parameter  A  =  100,  50, 10, 1.  Generally 
with  a  heavily  urban  setting,  a  reconstruction  using  A  =  1  is  indistinguishable  from 
the  original  DEM.  Note  that  the  coarser  reconstructions  still  look  reasonable  and 
in  fact  possess  large  TSSIM  values,  but  there  is  the  disturbing  tendency  for  the 
edges  of  wedgelets  that  do  not  match  up  to  appear  looming  like  an  eyesore  in  the 
reconstruction.  One  method  we  used  to  combat  this  mismatched  edge  problem  was 
to  introduce  a  local  selection  norm  that  balanced  the  overall  local  representation  vs. 
how  closely  the  edges  of  the  wedgelet  matched  the  altitudes  in  the  original  DEM. 
We  attempted  to  accept  a  slightly  inferior  0}  norm  locally,  to  have  the  obvious 
visual  disturbance  removed.  Unfortunately,  it  is  not  so  easy  to  trick  the  wedgelets 
algorithm  into  accepting  wedgelets  that  the  user  desires.  It  seems  that  in  some 
locations,  wedgelets  are  doomed  to  poor  reconstruction.  These  occur  at  corners  or 
multiple  edge  intersections  and  also  are  prevalent  when  the  layout  of  the  scene  aligns 
awkwardly  with  the  less-than-flexible  dyadic  mesh.  A  particularly  good  example  of 
this  kind  of  misalignment  with  the  decomposition  grid  occurs  on  the  large  building 
completely  in  the  scene  farthest  north.  The  dyadic  squares  center  around  a  nexus 
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of  three  edges,  and  the  best  wedgelet  fit  in  that  position  is  rarely  pleasing  to  the 
eye  unless  A  is  sufficiently  small. 


Figure  2.70:  S'iq  with  A  =  100. 


Figure  2.71:  S\q  with  A  =  50. 

slO  with  Lamda  equals  50 
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Figure  2.73:  Si0  with  A  =  1. 
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This  next  group  of  six  graphs  (figure  2.74)  attempts  to  compare  important 
quantities  for  the  wedgelet  transform  for  the  tiles  S9  and  Siq.  One  of  the  frustrating 
factors  of  compressing  a  DEM  via  a  parameter  A  is  the  lack  of  precise  control  on 
compression  ratios.  In  fact,  A  tends  to  scale  differently  with  various  types  of  images, 
leaving  a  game  of  guess-and-test  when  it  comes  to  compressing  or  reconstructing  to 
specific  levels.  When  studying  the  interaction  of  TSSIM  to  log10(A)  we  didn’t  see 
a  consistent  response  that  gave  us  a  heuristic  feel  for  compression  levels.  Because 
of  this,  we  turned  our  focus  away  from  graphs  comparing  TSSIM  to  some  variant 
of  A,  and  instead  compared  the  quality  to  the  total  number  of  coefficients.  In  this 
way  we  could  compare  the  wedgelets  procedure  to  other  methods,  like  the  time-scale 
methods  in  section  2.3. 

Now  that  we  have  fixed  our  method  of  comparison  to  be  between  TSSIM  and 
the  number  or  percentage  of  retained  coefficients,  we  can  compare  our  flat  piecewise 
constant  wedgelets  to  the  linear  wedgelets.  The  graph  in  figure  2.75  compares  these 
two  decomposition  methods  as  the  percentage  of  coefficients  decrease  and  the  image 
degrades.  The  graph  used  10  angles  for  both  methods,  and  local  i1  norm  for  wedgelet 
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selection.  The  response  to  other  tiles  in  the  S  and  T  tile  sets  were  not  as  tight  as  the 
curve  is  for  5i0.  Still,  this  points  out  why  linear  wedgelets  are  such  a  competitive 
compression  scheme  for  urban  DEM.  Note  that  at  35%  retained  coefficients,  the 
flat  wedgelets  overtake  the  linear  wedgelets  in  terms  of  quality  reconstruction.  This 
is  of  course  due  to  the  small-scale  advantage  that  flat  wedgelets  have  over  linear 
wedgelets. 

In  the  section  on  time-scale  representation  methods  for  DEM,  we  discovered 
that  contourlets  level  three  decomposition  was  our  best  method  for  compression  with 
high  reconstruction  TSSIM.  We  decided  that  a  TSSIM  of  0.8  or  above  is  adequate, 
and  contourlets  achieved  this  with  approximately  20%  used  coefficients.  Clearly  we 
are  not  interested  in  this  region  where  flat  wedgelets  overtake  the  linear  version. 
However,  note  that  linear  wedgelets  cross  the  crucial  0.8  TSSIM  mark  with  ap¬ 
proximately  17%  retained  coefficients.  This  seems  to  indicate  that  linear  wedgelets 
enjoys  a  healthy  advantage  with  moderate  sized  wedges,  when  it  can  utilize  its  larger 
number  of  coefficients  to  more  aptly  describe  large  structures. 
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Figure  2.75:  Flat  Wedgelets  vs.  Linear  Wedgelcts. 


%  Coefficients  vs.  TSSIM  for  Flat  Wedgelets  and  Linear  Wedgelets  on  S10 


The  next  two  graphs  in  hgures  2.76  and  2.77  illustrate,  to  some  extent,  the 
effects  that  angular  resolution  and  local  wedgelet  selection  norm  have  on  the  re¬ 
construction  TSSIM  on  both  Si0  and  Sg.  There  were  six  distinct  wedgelet  runs 
for  each  tile,  where  each  run  chose  either  4  or  41  angles  in  its  resolution,  and  each 
run  chose  a  local  i1,  £2  or  total  variation  norm.  Each  of  these  configurations  were 
degraded  by  changing  the  value  of  A,  and  the  graph  records  TSSIM  versus  retained 
coefficients.  It  is  fairly  difficult  to  see  the  graphs,  but  in  the  region  of  interest,  the 
order  of  quality,  while  close,  steadily  gets  better  in  quality  from  4  angles  £2  norm,  4 
angles  £l,  41  angles  £l  norm  and  finally  the  best  performer  being  41  angles  with  the 
local  £}  wedgelet  decision  norm.  This  graph  seems  to  indicate  that  greater  angular 
resolution  is  more  important  than  the  choice  of  local  wedgelet  norm,  but  they  are 
really  so  close  it  is  hard  to  say  for  certain.  All  things  being  equal,  we  will  choose 
the  £l  norm  as  it  seems  to  roundly  outperform  the  £2  norm,  especially  for  strongly 
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urban  DEM.  While  the  choice  of  norms  does  not  cost  us  computationally,  increasing 
the  angular  resolution  by  a  factor  of  ten  increases  runtime  by  a  factor  of  ten  as  well. 
The  linear  growth  in  computing  time  per  angle  makes  it  an  unattractive  parameter 
to  adjust. 


Figure  2.76:  5jo  Varied  Angles  and  Norms. 


Figure  2.77:  Sg  Varied  Angles  and  Norms. 


This  graph,  in  figure  2.78,  is  a  first  comparison  between  various  permutations 
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of  the  wedgelet  algorithm  compared  to  the  contourlets  compression.  In  the  15  to 
25  percent  retained  coefficient  range,  linear  wedgelets  with  41  angles  and  a  local 
t1  norm  are  clear  winners  in  representing  the  tile:  S'io.  Note  that  in  the  region  of 
interest,  contourlets  perform  very  well  against  flat  wedgelets. 


Figure  2.78:  Contourlets  vs.  Wedgelets. 


This  image  attempts  to  visualize  the  effect  of  angular  resolution  on  visual 
reproduction  of  S'io-  The  image  uses  piecewise  constant  wedgelets,  a  local  £2  norm 
and  each  retains  25%  of  the  coefficients.  While  the  defects  do  seem  to  smooth 
themselves  out  from  upgrading  the  angular  resolution  from  4  angles  to  40,  the 
errors  don’t  really  go  away  as  much  as  they  don’t  appear  as  abrupt  and  blocky. 
This  image  drives  home  the  inability  of  piecewise  constant  wedgelets  to  adequately 
describe  typical  urban  scenes. 

The  following  8  images  (figures  2.79  -  2.82)  are  meant  to  be  directly  compared 
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to  one  another,  as  to  show  how  the  difference  in  TSSIM  from  the  above  graphs, 
translate  into  visual  quality.  Note  that  in  each  of  the  next  6  images,  3  pairs  of 
similar  images,  the  methods  being  employed  are  reconstructed  using  25%  of  the 
coefficients.  The  contourlets  presents  a  marked  visual  enhancement  of  the  piecewise 
constant  wedgelets  on  Sw.  However,  as  you  progress  further  down,  it  is  hard  to  tell 
that  you  are  looking  at  a  reconstruction  at  all.  The  linear  wedgelets  with  4  or  40 
angles,  with  either  norm  do  exceptionally  well  visually  when  retaining  25%  of  its 
coefficients.  Just  to  prove  how  strong  the  visual  effect  is,  especially  compared  to 
piecewise  constant  wedgelets  or  contourlets,  the  seventh  image  (figure  2.83)  retains 
only  15%  of  its  coefficients  and  is  virtually  indistinguishable  from  the  original  copy 
of  S'iq  below  it  in  figure  2.84. 


Figure  2.79:  Sw  PW  Constant  Wedgelets  4/40  Angles. 


Flat  L2  4  Angle  25%  Flat  L2  40  Angle  25% 


Figure  2.80:  S'iq  Contourlets  Level  1/2. 


Contourlets  2-Level  25%  Contourlets  1-Level  25% 


The  next  two  graphs  (figures  2.85  and  2.86)  attempt  to  drive  home  the  futility 
(in  computing  time)  for  drastically  raising  the  number  of  analyzed  Wedgelet  angles. 
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Figure  2.81:  Sw  Linear  Wedgelets  4/40  Angles. 


Linear  L2  4  Angle  25%  Linear  L2  40  Angle  25% 


Figure  2.82:  Sw  Linear  Wedgelets  LI  Norm  4/40  Angles. 


Linear  LI  4  Angle  25%  Linear  LI  40  Angle  25% 


Figure  2.83:  Sio  Linear  Wedgelets  LI  Norm  40  Angles  15%  Retained  Coefficients. 


Linear  LI  40  Angle  15% 


Computing  time  scales  linearly  with  the  number  of  angles  used  in  the  decomposition; 
these  graphs  illustrate  the  gain  in  TSSIM  for  the  two  extremal  cases  for  each  of  the 
tiles  Sio  and  Sg.  The  base  case  is  a  minimal  collection  of  only  two  angles  (horizontal 
and  vertical),  while  the  second  uses  82  angles  at  each  level  of  the  decomposition. 
Note  a  very  small  gain  in  TSSIM/coefficient  with  a  very  large  computing  computing 
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Figure  2.84:  Siq  Original. 


Original  S10 


cost  a  factor  of  41. 

Figure  2.85:  Sg  Extreme  Angular  Resolution  Test. 


Figure  2.86:  S\q  Extreme  Angular  Resolution  Test. 


We  noted  that  many  defects  in  Wedgelet  images,  to  our  eyes,  took  place  on  the 
boundaries  between  dyadic  squares.  Even  though  TSSIM  indicates  quality  recon¬ 
struction  on  these  defects,  our  eyes  pick  them  up  readily.  In  an  effort  to  combat  this 
specific  problem,  a  black  box  modification  was  implemented  in  Wedgelets  to  allow 
wedge  decisions  to  be  based  on  the  particular  wedge  that  minimizes  what  we  call  the 
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original  boundary  defect  (essentially  a  line  integral  over  the  boundary  of  a  dyadic 
square  summing  the  difference  in  the  original  image  and  the  new  image).  Here  is  a 
set  of  graphs  (figures  2.87,  2.88  and  2.89)  that  display  how  the  modified  norm  acts 
to  subdue  these  visual  defects.  Two  extremal  angular  conditions  are  used,  namely  2 
and  82  angles.  In  the  percentage  of  retained  coefficients  vs.  TSSIM  graph,  a  curve 
representing  S\q  with  82  angles  and  a  local  £2  norm  was  added  as  a  reference.  Fur¬ 
thermore,  two  images  are  included  that  show  the  reconstructive  differences  between 
wedgelets  with  a  local  l2  norm  and  the  modified  boundary  total  variation  norm 
(BTV)  wedgelets,  each  with  82  angles.  They  share  a  similar  number  of  coefficients, 
approximately  40,000,  and  have  a  comparable  TSSIM,  only  differing  by  about  0.05. 
Something  to  note  is  how  accurately  the  BTV  method  finds  the  break  line  on  any 
slanted  roof.  Aside  from  this  gimmick,  the  norm  does  not  seem  to  be  able  to  find 
the  wedges  that  match  edges  with  the  underlying  DEM. 

Figure  2.87:  S'iq  Angular  Variation  and  Boundary  TV-norm. 


The  next  pair  of  graphs  (figures  2.90  and  2.91)  put  the  nail  in  the  coffin  for 
the  contourlets  verus  wedgelets  debate.  One  interesting  element  of  the  image  is  how 
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Figure  2.88:  5i0  Boundary  TV-norm.  Figure  2.89:  Sio  £2-norm. 

much  of  a  performance  hit  wedgelets  takes  to  a  rural  DEM,  namely  Sg.  Even  with 
the  performance  degradation  due  to  lack  of  cartoon-like  structures,  wedgelets  still 
outperforms  contour  lets,  visual  efficacy  per  coefficient. 


Figure  2.90:  S'io  Contourlets  vs.  Wedgelets. 


This  lengthy  analysis  of  the  wedgelet  representation  scheme,  and  its  many 
incarnations  has  shown  it  to  be  a  versatile  and  powerful  method  for  a  more  optimal 
and  intrinsic  technique  for  reproducing  the  cartoon  model  of  images.  Since  our 
urban  DEM  data  types  are  similar  to  this  simple  kind  of  structure,  and  since  our 
quality  measure  emphasizes  the  accurate  reconstruction  of  that  structure  we  have 
found  a  superior  technique  for  the  representation,  compression  and  reconstruction 
of  urban  DEM. 
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Figure  2.91:  S9  Contourlets  vs.  Wedgelets. 
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Chapter  3 


Dimension  Reduction  and  Material  Classification 
3.1  Introduction 

The  problem  of  unwieldy  data  is  one  shared  amongst  many  of  the  fields  that 
collect  and  analyze  various  sources  of  information.  The  rapid  expansion  of  computa¬ 
tional  power  and  massive  data  storage  has  engendered  within  us  a  ravenous  need  to 
expand  our  sets,  increase  resolutions  and  add  on  other  complexities  into  our  already 
burgeoning  data.  This  problem  is  exacerbated  in  the  case  of  high  dimensional  data, 
being  more  sensitive  to  the  so-called  Hughes  effect  [47].  The  popularity  of  recent 
compressed  sensing  techniques  are  a  nod  to  the  need  to  correct  for  the  over-collection 
of  information.  We  are  simply  drowning  in  this  overflow  of  information,  and  often, 
this  excess  acts  to  obfuscate  the  underlying  structures  in  the  data. 

In  many  applications  of  machine  learning,  data  mining  and  image  processing, 
high  dimensional  data  is  collected.  Examples  of  such  data  sets  could  be  composed 
of  global  climate  information,  stellar  or  terrestrial  spectra  or  even  human  genome 
distributions.  Although  the  collected  data  is  high  dimensional,  often  it  is  the  case 
that  the  data  is  intrinsically  low  dimensional.  The  following  examples  are  taken  from 
Maggioni  [45]  and  are  excellent  illustrations  of  the  power  of  dimension  reduction 
techniques. 
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Example:  Consider  a  data  cloud  consisting  of  1047  scholarly  articles  taken  from 
Science  News  in  some  relatively  small  timeframe.  We  will  collect  a  dictionary  of 
2036  relevant  words  to  the  articles  appearing  in  this  cloud.  We  order  these  words 
in  any  manner,  and  then  we  assign  each  of  the  1047  articles  to  a  vector  x k  €  R2036, 
where  the  f  th  position  of  the  fc’th  vector  is  the  frequency  in  which  word  j  appears 
in  article  k. 

We  assign  each  of  the  vectors  to  one  of  the  eight  classes:  Anthropology,  Astron¬ 
omy,  Social  Sciences,  Earth  Sciences,  Biology,  Mathematics,  Medicine  and  Physics. 
We  assign  these  based  on  the  classification  under  which  the  articles  were  published. 
We  then  have  {xk}k=\  C  R2036,  where  each  Xk  is  assigned  a  unique  class.  As  long 
as  the  fields  in  which  the  papers  are  published  are  relatively  separated  (that  is,  not 
electrical  engineering  and  applied  mathematics),  we  would  expect  the  jargon  differ¬ 
ences  would  separate  the  papers  in  this  high  dimensional  space.  Clearly  there  is  a 
lot  of  structure  in  this  data  set,  but  the  dimension  of  the  ambient  space  is  extremely 
large  for  the  kind  of  data  and  number  of  points. 

Maggioni  reduces  the  dimension  of  this  set  with  a  Laplacian  kernel  method 
and  then  embeds  the  set  into  six- dimensions.  To  visualize  the  projection,  he  then 
constructs  two  plots,  one  with  coordinates  3,4,5  represented  as  x,y,z,  and  the 
other  with  coordinates  4,5,6  represented  as  x,y,z.  The  reduced  dimension  points 
are  colored  based  on  the  class  from  which  they  derive.  These  plots  show  that  there 
is  clearly  strong  geometry  connecting  similar  papers  and  differentiating  papers  that 
are  dissimilar.  The  idea  is  that  the  embedding  preserved  all  of  the  structure  from 
the  large  dimensional  data  set,  in  fact,  possibly  enhancing  the  local  structures  by 
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embedding  it  on  an  underlying  manifold. 


Figure  3.1:  Article  Data  Projected  onto  R3:  Maggioni. 
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Example:  The  next  example  that  Maggioni  uses  to  illustrate  the  uses  of  dimension 
reduction  comes  from  a  post  office  data  set.  The  post  office  has  much  interest  in 
being  able  to  identify  hand-written  numbers  and  letters,  with  all  of  the  variations 
that  humans  can  produce.  The  data  set  consists  of  60,  000  28  x  28  gray-scale  images 
of  the  digits  1  through  9.  That  is,  we  have  {x*,}^100  C  R28-  =  R784,  where  each 
vector  is  an  unfolded  image.  We  can  similarly  assign  each  vector  to  one  of  the  classes 
{1,  2, ...,  9}.  The  included  image  shows  a  sample  of  the  data  set. 

The  data  has  its  dimension  reduced,  and  then  3  of  the  remaining  coordinates 
are  picked  out  to  visualize  the  structure  in  each  of  the  plots.  This  example  shows 
a  tremendous  amount  of  underlying  structure;  structure  that  is  only  enhanced  by 
the  reduction  of  dimension.  Such  a  system  could  be  a  very  rough  form  of  character 
recognition,  but  the  important  aspect  of  these  examples  is  that  the  structure  not 
only  survived  the  embedding,  but  appears  as  very  coherent. 

These  examples  illustrate  the  goals  of  dimension  reduction  quite  admirably. 
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Figure  3.2:  Hand- Written  Numeral  Data  Projected  onto  R3:  Maggioni. 
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Namely,  the  desire  to  embed  a  high  dimensional  data  set  onto  a  much  lower  dimen¬ 
sional  intrinsic  manifold,  while  retaining  the  pertinent  local  geometry  such  as  neigh¬ 
borhoods,  tangent  spaces  and  geodesics,  see  [5]  [6].  In  the  words  of  C.  Burges  [12], 
“Apart  from  teaching  us  about  the  data,  dimension  reduction  can  lead  to  better 
models  of  inference.” 

Modern  dimension  reduction  invariably  starts  with  Principal  Component  Anal¬ 
ysis  (PCA),  but  this  is  by  no  means  the  first  incarnation  of  embedding  methods. 
The  theory  has  beautiful  classical  roots  that  this  thesis  will  not  have  the  space  to 
document  them.  PCA  will  be  our  baseline  linear  reduction  technique,  but  our  real 
tools  will  be  the  non-linear  kernel  methods  Laplacian  Eigenmaps  (LE)  and  Locally 
Linear  Embedding  (LLE) .  Typical  kernel  methods  of  dimension  reduction  consist  of 
representing  the  data  and  local  geometry  within  a  kernel  and  approximating  that 
kernel  with  a  low-rank  operator  found  by  truncating  its  spectral  series  representa¬ 
tion. 

This  chapter  is  specifically  focused  on  the  dimension  reduction  of  terrestrial 
hyperspectral  data,  where  the  purpose  of  the  reduction  is  to  aid  in  automatic  mate¬ 
rial  classification.  With  this  narrow  scope  in  mind,  we  can  see  the  large  importance 
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we  place  on  the  preservation  of  local  geometry  through  the  embedding  process,  since 
classification  cannot  be  performed  when  dissimilar  materials  bleed  into  one  another’s 
spectral  signatures.  We  will  approach  this  problem  from  two  distinct  perspectives. 

The  first  set  of  ideas  comprises  a  so-called  dimension  reduction  pipeline.  This 
involves  implementing  a  series  of  modular  stages  giving  a  flexible  dimension  reduc¬ 
tion  framework.  These  stages  can  be  loosely  categorized  as  Landmarking,  Kernel 
application,  Out  of  sample  extension  (inversion  of  the  landmarking  process)  and 
finally  the  creation  of  an  endmember  frame  and  projection  of  the  data  onto  that 
frame. 


Figure  3.3:  Frame-Based  Kernel  Dimension  Reduction  Pipeline. 


The  landmarking  procedure  drastically  lowers  the  computational  cost  of  ex¬ 
pensive  processes  like  nearest  neighbor  generation  and  eigenvalue  decomposition  of 
large  data  sets.  This  is  essentially  done  by  sampling  the  data  and  treating  the  sam¬ 
ple  like  a  full  data  cube  for  dimension  reduction,  with  a  cost  in  accuracy  that  tends 
to  be  acceptable.  The  out-of-sample  extension  essentially  inverts  this  process  by 
extrapolating  the  dimension  reduction  to  those  points  that  were  not  sampled.  Our 
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kernel  methods  for  this  pipeline  are  either  Laplacian  Eigenmaps  or  Locally  Linear 
Embedding.  This  stage  consists  of  forming  a  graph  from  the  sampled  data,  con¬ 
structing  a  kernel  that  represents  the  local  geometry  and  then  using  a  truncated 
spectral  change  of  coordinates  to  reduce  the  dimension  of  the  points  in  the  sample. 
The  above  steps  are  all  well-known  and  are  only  unique  in  the  order  in  which  they 
are  applied.  While  this  portion  of  the  process  is  vital  for  preparing  the  data  for 
classification,  the  novel  part  of  this  pipeline  is  the  introduction  of  frames  to  replace 
traditional  endmember  constructions.  The  flexibility  that  frames  allow  with  the 
ability  to  design  elements,  and  the  added  redundancy  of  the  representations  are 
extremely  powerful  when  finally  passing  to  the  classification  stage  of  the  process. 

The  second  portion  of  this  chapter  will  briefly  develop  a  technique  that  couples 
wavelet  packets  with  kernel  dimension  reduction  methods  to  create  a  process  that 
is  sensitive  to  both  spectral  and  spatial  information.  Since  our  data  corresponds 
to  physical  geographic  locations,  the  individual  locations  contain  important  local 
information  that  is  not  otherwise  exploited  by  dimension  reduction  techniques.  This 
can  be  seen  clearly  by  observing  that  if  a  certain  location  can  be  identified  as  a 
material,  like  a  road,  that  is  typically  contiguous,  the  neighboring  spectral  positions 
(pixels)  are  more  likely  to  also  be  road  than  is  a  random  point  in  the  scene. 

This  chapter  shall  be  broken  down  into  three  sections.  Section  one  shall  dis¬ 
cuss  our  data  type:  hyperspectral  data.  Section  two  shall  develop  the  frame-based 
kernel  dimension  reduction  pipeline  and  conclude  that  this  process  outperforms  the 
corresponding  base  level  technique  on  its  own  with  respect  to  material  classification. 
Section  three  shall  develop  the  spectral-spatial  methodology,  and  it  will  be  shown 
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that  the  hybrid  method  with  PCA  outperforms  PCA  on  its  own. 

3.2  Hyperspectral  Data 

No  one  seems  to  be  quite  sure  what  the  precise  distinction  is  between  a  multi- 
spectral  data  set  and  a  hyper-spectral  set.  Spectral  information  has  become  a  ubiq¬ 
uitous  member  of  most  government  labs  and  applied  harmonic  analysis  departments 
all  around.  That  being  said,  hyperspectral  data  is  one  of  the  rarer  forms  that  this 
information  can  take.  Most  organizations  that  acquire  this  type  of  data  do  so  at 
tremendous  cost,  and  they  tend  to  keep  their  data  close  to  the  chest.  As  the  public’s 
access  to  satellites  increases  and  sensor  costs  decrease,  we  find  an  increasing  number 
of  spectral  bands  in  our  repertoire. 

Hyperspectral  data  is  still  better  described  than  rigorously  defined.  Generally 
speaking,  hyperspectral  data  is  a  large  collection  of  spectral  reflectance  information 
for  a  single  geographic  (or  in  some  cases,  geological  core  samples)  region.  In  its  most 
idealized  state,  hyperspectral  data  is  just  the  collection  of  reflectance  energy  that 
returns  from  an  exposure  to  a  specific  frequency  of  light.  An  oversimplified  version 
of  the  truth  is  that  hyperspectral  data  is  collected  by  satellites  or  airplanes  that  can 
filter  or  process  multiple  separate  wavelengths  of  light  at  the  same  time.  The  data 
typically  takes  the  form  of  a  collection  of  what  look  like  black  and  white  images  that 
compose  a  so-called  hypercube.  In  reality,  the  geographic  regions  in  each  image  are 
the  same,  but  the  intensity  of  a  pixel  in  any  particular  image  corresponds  directly 
to  that  locations  spectral  response  to  a  frequency  represented  by  that  band.  In  this 
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sense,  we  can  think  of  a  single  pixel  as  representing  a  sampling  of  the  spectra  of 
some  material  near  that  pixels  location.  It  is  the  hyperspectral  data  that  uses  a 
significant  number  of  samples  for  each  pixel. 

In  reality,  hyperspectral  data  exists  for  pretty  much  any  application  you  could 
want.  Geologists  use  it  to  study  the  mineral  composition  of  far  away  regions,  or  to 
study  small-scale  formations  in  core  samples.  Our  greatest  space  telescopes  collect 
dense  spectral  information  all  the  time,  to  peer  into  the  mysteries  of  the  universe. 
In  this  chapter,  hyperspectral  data  will  explicitly  refer  to  hypercubes  of  spectral 
reflectance  information.  A  somewhat  overused  example  of  a  hypercube  exhibits  the 
three  dimensional  character  of  the  data. 

Figure  3.4:  The  Prototypical  Example  of  a  Hypercube. 


The  most  important  properties  of  hyperspectral  data  are  their  narrow  and  con¬ 
tiguous  measurements,  being  spectrally  over  determined  and  their  utility  in  target 
detection,  material  classification  and  surface  mapping  with  material  identification. 
Two  very  important  and  often  used  hyperspectral  data  sets  are  referred  to  as  the 
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Urban  and  Smith  Island  data  set  respectively. 


Set. 


These  sets  are  well  known  and  often  used  for  good  reason,  they  are  some  of 
the  few  collections  of  data  that  are  not  highly  classified  and  restricted  from  public 
study.  These  will  be  our  primary  test  data  sets,  and  many  of  our  results  will  be 
based  on  these  hyperspectral  sets,  since  others  are  not  allowed  to  be  published  in 
this  thesis.  The  Urban  data  set  is  307  by  307  pixels  and  consists  of  roughly  161 
spectral  bands.  Note  the  large  prominent  Walmart  near  the  center  of  the  image.  If 
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you  look  closely  in  the  western  portion  of  the  image,  you  can  make  out  pixel/sub¬ 
pixel  swimming  pools  which  will  crop  up  later  in  the  chapter.  The  Smith  data  set  [2] 
is  679  by  944  pixels,  and  contains  126  spectral  bands.  If  you  look  really  closely  in 
the  watery  portion  of  the  western  half  of  the  island,  for  several  spectral  bands  you 
can  make  out  sub-sea  level  structures  below  the  water.  These  are  the  elusive  targets 
that  we  shall  forever  endeavor  to  discover.  Figure  3.7  is  a  statistical  envelope  for 
three  spectral  classes  within  the  Smith  Island  dataset.  The  classes  represented  are 
water,  sand  and  flora,  the  dotted  line  represents  an  average  class  spectra  and  the 
filled  in  portion  represents  a  standard  deviation  in  each  direction  from  the  mean. 

Figure  3.7:  Statistical  Spectral  Envelopes. 


Here  are  a  couple  of  examples  (Figures  3.8  and  3.9)  of  spectral  bands  from  the 
Urban  set.  Note  that  the  true  color  images  are  acquired  from  taking  bands  closest 
to  red,  green  and  blue  and  put  their  reflectance  together  for  a  color  image.  This 
shows  that  the  spectral  coverage  of  the  data  sits  well  within  the  visual  wavelengths 
of  light. 
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Figure  3.8:  Urban  Tile  Spectral  Band.  Figure  3.9:  Urban  Tile  Spectral  Band. 


3.3  Kernel-Based  Frame  DR  Methods:  The  Pipeline 

We  start  with  a  hypercube,  X ,  that  is,  a  real  valued  N\  x  N2  x  D  dimensions. 
Since  our  data  will  almost  always  contain  hyperspectral  information,  we  can  think 
of  this  cube  as  D-spectral  bands,  where  each  of  the  bands  is  N\  x  N2  representing 
the  same  geographical  region  of  space,  where  a  pixel  in  each  band  represents  some 
reflectance  of  that  bands  wavelength  from  the  region  the  pixel  represents. 

We  can  think  of  this  as  a  regular  sampling  of  some  manifold  M,  which  has  an 
unknown  underlying  dimension  d  «  D.  The  consensus  is  not  yet  in  on  whether  this 
is  a  helpful  perspective  to  take,  concerning  an  underlying  manifold  even  existing. 
For  instance,  clearly  the  data  in  X  can  be  embedded  into  infinitely  many  manifolds, 
or  weirder  yet,  into  some  one-dimensional  curve  in  space  by  essentially  connecting 
the  dots. 

This  frame  of  reference  is  further  confused  upon  the  realization  that  spectral 
sampling  may  itself  not  be  possible  since  the  underlying  surface,  i.e.,  the  surface  of 
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Figure  3.10:  Typical  Hyperspectral  Data  Cube  Point  Xj  G  HD. 


the  Earth,  may  in  fact  be  discrete,  assuming  some  smallest  form  of  matter. 

Even  with  these  questions  looming  about  the  veracity  of  underlying  manifolds 
existing  for  hyperspectral  data,  we  will  accept  that  the  mathematics  are  simpler 
assuming  so,  and  for  us,  X  will  be  a  sampling  from  a  lower  dimensional  manifold. 
This  allows  us  to  use  analytical  results  about  graph  Laplacians  by  exploiting  the 
underlying  structure  of  the  manifold. 

We  unfold  our  data  cube  into  a  list  of  points  Xdata  =  C  RD,  where 

the  points  Xk  are  D-dimensional,  and  are  columns  in  the  D  x  NiN2  matrix  Xdata. 

We  take  this  collection  of  points  and  from  it  create  a  directed  graph,  G,  using 
one  of  the  following  criteria: 

1.  fc- nearest  neighbors.  We  choose  some  1  <  k  «  N  =  N\  iV2  and  generate  the 
set  of  neighbors  to  the  point  Xj,  £1^.  where  we  define 
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yXi  =  argminxeX\{xj}  \  \x  -  Xj 

has  y *3  G  and  recursively, 


Vi  -  ar9mmxex\{Xj ,vl\ W-XjU. 


Then  fi*.  =  {y^3 ,  ...ryxk3}.  In  this  case,  for  x  G  G,  x  y  iff  x  G  f l\. 


2.  e-neighbors.  Choose  an  e  >  0.  In  this  simple  case,  we  assert  that  x  ^  y  iff 
x  G  Be(y). 


The  choice  of  neighborhoods  for  our  graph  is  of  great  importance.  Note  that  e- 
neighborhoods  will  generate  non-directed  graphs,  while  /c-nearest  neighbors  can  have 
asymmetries.  Laplacians  contain  a  great  deal  of  localized  information,  essentially 
characterizing  local  geometry  within  the  data.  With  this  perspective  it  is  easy  to 
see  that  by  choosing  our  local  neighborhoods  arbitrarily  and  without  concern  for 
the  underlying  geometry  of  the  data,  our  constructed  graph  Laplacian  will  not  be 
as  useful  and  a  host  of  undesirable  features  can  crop  up  into  our  results. 

This  is  a  relatively  deep  problem,  about  detecting  local  geometry  in  discrete 
collections  of  points  and  is  very  much  still  a  complicated  issue  surrounding  the  cre¬ 
ation  of  geometrically  sensitive  operators  on  graphs.  Consider  the  following  simple 
example. 

Here  the  two-dimensional  data  in  X,  in  Figure  3.11,  is  much  like  what  raw 
data  typically  looks  like;  a  collection  of  points  with  no  obvious  geometry.  Figure 


Figure  3.11:  Data  Points  X  C  IR2.  Figure  3.12:  Manifold  X  C  M. 

3.12  could  be  considered  the  underlying  structure  of  the  data,  lying  on  the  one¬ 
dimensional  manifold  M.  Without  knowing  M  and  with  only  clues  provided  by 
the  data  itself,  how  do  you  choose  quality  neighborhoods  that  at  least  approximate 
the  structure  of  Ml  Note  that  it  is  possible  for  this  set  of  data  and  underlying 
manifold,  to  choose  an  e  or  k  for  which  the  following  two  data  points  (Figure  3.13) 
are  considered  neighbors. 

Figure  3.13:  A  Poorly  Chosen  Neighborhood. 


In  fact,  any  larger  e  or  k  will  provide  the  same  ‘bad  neighbor’  as  in  Figure  3.13. 
This  is  how  &;-nearest  neighbors  and  e-neighborhoods  can  fail,  since  they  are  really 
coarse  approximations  of  the  underlying  local  geometry.  Ideally,  you  would  like  to 
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find  these  local  neighborhoods  by  some  geodesic  distance  function,  which  somehow 


was  able  to  know  the  underlying  structure.  The  ideal  neighborhood  for  the  point  in 
question  would  probably  look  like  that  in  Figure  3.14. 

Figure  3.14:  A  Well  Chosen  Neighborhood. 


There  is  no  reason  to  think  that  choosing  our  neighborhoods  in  such  a  clumsy 
manner  could  return  elegant  results,  but  it  turns  out  that  often  local  Euclidean 
neighborhoods  are  enough  to  encode  the  important  geometry  of  the  data  into  the 
Laplacian.  While  there  is  ongoing  research  into  finding  more  intrinsic  neighborhoods 
within  data  [56],  by  choosing  e  or  k  sufficiently  large  (at  the  cost  of  extra  compu¬ 
tational  expense)  we  can  gather  neighborhoods  that  contain  the  relevant  geometry 
with  a  small  remainder  of  non- relevant  neighbors,  as  in  Figures  3.15  and  3.16. 

For  the  purposes  of  this  thesis,  these  neighborhoods  will  be  sufficient. 

3.3.1  The  Laplacian-Eigenmaps  Kernel 

Starting  with  the  graph  we  generated  in  the  previous  section,  ( G,E ),  and 
continuing  with  the  guidelines  established  in  [5]  [6] ,  choose  a  >  0  and  we  define  the 


Figure  3.15:  A  Decent  e— Neighborhood.  Figure  3.16:  A  Decent  k— Neighborhood. 
Laplacian  Eigenmaps  weights  between  connected  vertices  to  be 


W{xi,Xj) 


e  Xi  Xj 

0  otherwise 


Next  we  form  the  degree  matrix,  D  =  ( dij )  by 


I  0  *  ^  3 

|  J2k=iNlN2W(xj,xk)  i  —  j 
The  graph  Laplacian  can  then  be  defined  as 


L  =  D  —  W. 

3.3.2  Locally  Linear  Embedding 

Roweiss  and  Saul  in  [49]  developed  a  nonlinear  kernel  which  attempts  to  fol¬ 
low  the  following  general  steps,  which  seem  to  always  find  a  voice  with  nonlinear 


phenomenon: 


1.  Stick  to  not-too- nonlinear  structures, 


2.  Somehow  decompose  nonlinear  structures  into  linear  subspaces, 

3.  Generalize  the  eigenvalue  problem  of  minimizing  distortion. 

Their  method  for  constructing  a  kernel  consists  of  forming  a  graph  as  in  the 
general  case,  and  finding  fc- nearest  neighbors.  From  there,  the  process  differs  signif¬ 
icantly  from  the  Laplacian  Eigenmaps  case.  The  next  step  is  also  to  compute  a  set 
of  weights,  and  in  this  case  the  weights  rnnst  satisfy,  minimizing  the  residual  sum 
of  squares  when  reconstructing  each  point  from  its  neighbors, 


RSS(w)  =  Y 


2=1 


Xi 


'Y .  Wi,jXj 

j¥=i 


(3.1) 


P 


subject  to  the  constraints  that  Wij  =  0  unless  j  is  one  of  i’s  neighbors,  and  having 
a  unit  sum  vJhJ  =  1  for  each  i.  Solving  for  the  new  coordinates  from  the  weights 
then  involves  minimizing  the  functional 


«(n  =  £ 


2=1 


Vi  wi,jVj 


(3.2) 


P 


subject  to  the  constraints  JT  Ytj  =  0  for  each  j  and  YTY  =  /. 

Luckily  enough,  solving  this  problem  really  reduces  to  the  construction  of  the 


kernel 


K  =  (/  —  w)T(I  —  w), 


(3.3) 
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and  then  follows  the  same  procedure  in  the  case  of  the  Laplacian  Eigenmaps  kernel 
in  order  to  change  the  coordinates  to  the  lower  dimensional  subspace.  This  is  a 
convenient  change,  as  it  really  allows  for  our  pipeline  to  have  a  modular  assortment 
of  reduction  kernels. 

3.3.3  Dimension  Reduction  by  Change  of  Coordinates 

We  have  now  looked  at  a  couple  of  examples  of  how  to  transform  a  hypercube 
X  into  an  operator  that  faithfully  encodes  much  of  the  local  geometry  from  the  data. 
We  now  wish  to  change  the  coordinates  of  our  data  into  those  of  a  lower  dimensional 
space,  for  which  we  have  preserved  many  important  geometric  quarks  of  the  data 
while  projecting  it  onto  a  close  approximation  of  its  underlying  manifold. 

Specifically,  we  are  looking  for 

p:X  CRD4  Y  C  Rd, 

where  we  think  of  Rd  as  being  the  coordinate  space  of  the  underlying  manifold.  Our 
method  of  extracting  this  change  of  coordinates  will  involve  the  spectral  theorem 
and  only  retaining  d  of  the  highest  energy  spectral  directions.  These  are  by  no  means 
the  only  possibly  coordinates  in  which  the  data  can  be  represented.  In  some  sense, 
reducing  the  dimension  of  our  data  by  merely  lopping  off  lower  energy  spectral 
directions,  from  the  kernel  K,  is  both  not  elegant  and  throws  away  information 
that  could  contain  vital  details  about  the  data’s  geometry.  If  the  eigenvalues  of 
the  Kernel  decay  rapidly  after  some  number,  then  we  can  say  with  confidence  that 
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those  largest  eigenvectors  span  almost  all  of  the  space,  at  least  the  most  important 
subspace.  The  hope  is  that  with  data  that  really  does  exist  on  a  low-dimensional 
manifold,  and  with  proper  neighborhood  selection,  the  eigenvalues  of  the  Kernel  will 
decay  in  such  a  way  that  we  can  estimate  the  dimension  of  the  underlying  manifold 
and  project  our  data  onto  that  space  with  great  accuracy.  The  reason  that  we  use 
this  particular  change  of  coordinates,  is  that,  at  least  in  the  case  with  Laplacian 
Eigenmaps,  we  have  convergence  results  of  this  kernel  to  the  actual  graph  Laplacian 
in  probability,  given  very  mild  conditions  on  the  graph  [5]  [6]. 

We  decompose  K  into  its  eigen-decomposition, 

{(Ai,^)}^  where  Ai  >  A2  >  ....  >  Xn 

We  retain  the  d-eigenvectors  corresponding  to  the  largest  eigenvalues,  and  our 
change  our  coordinates  is  given  by 


Vk  =  p(xk)  =  (vi [k\,v2[k\,  ...,vd[k]), 

where  Vj[k]  is  the  fc’th  coordinate  of  the  eigenvector  corresponding  to  the  j'th  largest 
eigenvalue. 

3.3.4  Landmarking 

Kernel  methods  for  dimension  reduction  are  elegant  and  powerful.  However, 
they  inherit  a  fate  similar  to  their  data:  information  overload.  Storage  of  hyperspec- 
tral  data  is  cumbersome,  even  with  coarse  resolution  and  sparse  spectral  sampling, 
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the  amount  of  information  is  simply  too  great  to  be  processed  or  stored  indefinitely. 

The  process  of  creating  a  kernel  for  dimension  reduction  and  then  reducing  the 
dimension  involves  a  massive  computational  load,  scaling  poorly  with  the  spectral 
bands  D  and  even  worse  in  the  geographical  bounds  N\  and  N2.  The  main  bottle¬ 
necks  in  the  processing  are  the  neighborhood  selection  methods  ( k 2  log  k)  and  finally 
in  the  eigenvalue  problem  .  While  our  kernel  is  hopefully  quite  sparse,  by  choosing 
small  enough  neighborhoods,  the  eigenvalue  problem  still  scales  very  poorly.  One  of 
the  biggest  challenges  in  processing  massive  hypercubes  is  in  the  amount  of  physical 
memory  available  to  the  computer  quickly  becoming  too  little  to  process. 

Because  of  this  and  other  reasons,  it  is  of  great  benefit  to  reduce  the  compu¬ 
tational  load  as  well  as  the  memory  requirements  of  the  algorithm.  Land-marking 
is  a  process  in  which  we  drastically  reduce  the  computational  load  in  such  a  way 
that  we  preserve  most  of  the  relevant  geometry  and  our  ability  to  eventually  classify 
materials. 

We  first  select  a  sample  of  points  from  X.  There  are  many  ways  to 

choose  this  sample,  and  the  selection  method  comes  with  its  own  list  of  benefits 
and  consequences.  It  is  an  interesting  problem  to  consider  how  to  sample  a  large 
scene  for  the  purpose  of  reducing  computational  complexity,  while  not  leaving  out 
important  anomalies  that  are  represented  by  few  of  the  data  points.  We  shall  select 
a  uniformly  random  sample  from  the  data,  where  M  is  some  percentage  of  the  the 
number  of  data  points  X.  This  percentage  is  typically  two  to  twenty  percent.  One 
advantage  of  this  technique  are  estimation  bounds  on  error  and  rates  of  convergence 
established  by  [9].  Of  course,  one  of  the  disadvantages  are  a  lack  of  data  dependence 
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and  causing  the  algorithm  to  be  non-deterministic. 

After  obtaining  this  sampling,  Z,  we  generate  a  graph  from  the  points  in  Z 
and  create  a  kernel  from  this,  as  if  we  had  reformed  the  sampled  data  into  a  new 
smaller  data  cube,  as  this  picture  demonstrates. 


Figure  3.17:  Typical  Landmarking  Sample. 


We  proceed  as  if  this  were  a  full  data  cube,  create  a  kernel  and  project  the 
data  onto  the  lower  dimensional  space. 

Figure  3.18:  Dimension  Reduction  on  the  Sampled  Set. 


Vj  =  <vi(j),  v2(j),..,vd(j)> 


The  strength  of  the  land-marking  procedure  is  that  it  provides  a  natural  mech¬ 
anism  to  extrapolate  from  this  sampled  dimension  reduction,  a  full  projection  of  the 
entire  data  cube  onto  the  lower  dimensional  coefficient  space.  This  method  was 
developed  by  Bengio,  Paiement  and  Vincent  in  [9]  and  refined  by  Coifman  and  La- 
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fon  in  [15]  for  use  with  LLE.  These  papers  give  reasonable  estimates  relating  the 
sampling  percentage  to  the  result  fidelity.  Choose  an  e  X  \  Z ,  since  this  is  not 
a  sampled  point,  we  do  not  know  right  away  how  to  project  it  onto  Y .  The  idea  is 
to  represent  Xk  as  a  combination  of  ‘pseudo-neighbors’  from  the  sampled  collection 
Z.  We  essentially  End  a  collection  of  neighbors  for  Xk,  {z^,  Zk2,  Zk3, using  one 
of  our  criteria  for  neighbors,  where  we  exclude  all  of  the  vectors  that  are  not  part 
of  the  sampled  set. 

Luckily  we  do  have  information  about  the  projection  of  the  ‘pseudo-neighbors’ 
onto  the  space  Y,  so  we  will  extend  the  projection  onto  Y,  to  the  vector  Xk,  by 
expressing  it  as  a  weighted  sum  of  the  projections  of  those  same  pseudo-neighbors. 
The  only  step  remaining  is  to  recalculate  the  graph  weights  by  taking  into  account 
Xk,  and  then  we  can  define  the  projection  of  Xk  as 


Figure  3.19:  A  Schematic  of  the  Out-of-Samplc  Extension. 

yi  =  <v1(j),  v2(j),..,vd(j)>  x,~{zk  z,  z, 


Y  =^VklW(xl,zk|) 
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3.3.5  Endmember  Selection 


The  fourth  step  in  our  algorithm  is  to  select  endmembers  for  the  low  dimen¬ 
sional  space  Y  C  Rd.  Endmembers  are  typically  defined  to  be  the  constituent 
elements  of  a  scene,  that  is,  they  are  generally  thought  to  compose  the  pure  ele¬ 
ments  in  a  spectral  data  cube.  Traditional  applications  of  endmember  algorithms 
are  run  on  the  original  high  dimensional  data  set  X  C  R15  and  if  s  denotes  the 
number  of  endmembers  then  s  <  D.  Since  we  are  finding  endmembers  for  the  space 
Y,  we  propose  finding  s  >  d  endmembers,  thus  creating  a  frame  <f>  =  {0*} f=1  for  Y. 
Frames  arise  naturally  in  dimension  reduction,  and  are  in  fact  a  generalization  of 
orthonormal  bases.  There  are  many  endmember  selection  algorithms  available,  e.g., 
N-FINDR  [57],  ORASIS  [11]  and  Pixel  Purity  Index  [10];  see  also  [58]  and  [36].  The 
results  of  this  section  employ  the  Support  Vector  Data  Description  (SVDD),  see, 
e.g.,  [4]  algorithm  for  selecting  endmembers.  The  core  idea  of  SVDD  is  to  obtain 
a  minimal  spherical  shaped  boundary  around  the  data  set,  which  in  turn  gives  a 
description  of  the  data  in  terms  of  a  set  of  support  vectors. 

3.3. 5.1  Frame  Coefficients 

Given  a  frame  $  =  {bj}f=1  for  Y,  we  shall  find  a  set  of  coefficients  C  = 
l  that  represents  Y  in  terms  of  <P: 


iji  =  Ci,j4>j  f°r  all  i  =  1, N.  (3.4) 

l=i 

We  propose  two  separate  ways  to  find  C ,  allowing  for  the  pursuit  of  this 
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modular  pipeline.  The  first  is  based  on  the  frame  operator  S  :  — »  Rd,  which  is: 


Sy  =  <  y,4>i>  <f>i  for  all  y  G  Rd.  (3.5) 

1=1 

For  any  frame  <f>,  the  frame  operator  S  is  invertible,  and  in  fact  gives  the  following 
representation: 


y  =  <  y,  S  1(j)i  >  4>i  for  all  y  e  Rd.  (3.6) 

i= 1 

Thus  we  can  define  the  coefficient  set  C  =  as: 

Cij  =<  y^S^^j  >  for  all  i  =  1, ...,  N,  j  =  1,  ...,s. 

The  coefficients  {qj}  are  called  the  canonical  coefficients  and  they  minimize 
the  energy  of  the  coefficient  set  C.  An  alternative  to  the  canonical  coefficient 
set  is  to  find  sparse  coefficient  representations.  Such  coefficients  can  be  found  by 
minimizing  the  £p  energy  of  the  coefficients,  where  0  <  p  <  1: 

S 

cy.  =  arg mine  ||c||#>  subject  to  yl  =  Cj(j>j .  (3.7) 

3= 1 

3.3.6  Frame  Based  Kernel  Method  Results 

Before  moving  to  the  main  results  of  this  section,  there  are  a  couple  of  impor¬ 
tant  images  that  should  be  singled  out.  First,  the  pipeline  implemented  with  the 
urban  data  set  can  generate  the  following  class  map  in  Figure  3.20,  just  to  illustrate 
the  kinds  of  data  we  hope  to  end  up  with. 
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Figure  3.20:  Class  Map  of  the  Urban  Data  Tile. 


That  is,  we  tell  the  pipeline  which  kernel  to  use,  how  many  neighbors  to 
make  the  graph  implement,  how  to  compute  the  reduced  coefficients  and  sampling 
percentages  for  the  landmarking  process.  Once  all  of  these  ideas  are  fixed,  there 
are  only  two  important  numbers  to  consider.  One  is  the  number  of  dimensions  to 
reduce  to,  the  second  is  the  number  of  endmembers  used  to  represent  that  lower 
dimensional  set.  A  very  interesting  phenomenon  came  about  while  running  an 
experiment  on  the  urban  data  set,  that  included  permuting  those  two  significant 
variables  with  everything  else  being  fixed.  The  resulting  image  in  Figure  3.21  is  due 
to  that  experiment. 

This  image  is  very  special.  The  x-axis  represents  the  number  of  dimensions 
the  urban  data  cube  was  reduced  to,  with  a  range  of  roughly  10  to  35.  The  y-axis 
represents  the  number  of  endmembers  used  to  represent  that  data.  Note  that  when 
s  =  d  the  representing  set  is  a  basis,  and  when  s  >  d  the  representing  set  is  a  frame. 


Figure  3.21:  The  Frame  Flag. 


d 


The  black  diagonal  line  represents  the  boundary  between  a  frame  and  a  spanning 
collection  of  a  subspace.  This  particular  data  set  conies  equipped  with  ground  truth 
consisting  of  22  classes  divided  up  into  pre-known  pixel  locations.  We  can  assert 
the  quality  of  a  classification  into  those  22  classes  (like  in  Figure  3.20)  by  testing 
the  classified  ‘known  pixel’  locations  against  our  attempts  at  classifying  them.  This 
gives  us  a  percentage  of  true  classifications  and  false  positives.  The  color  of  this 
flag-image  represents  the  quality  of  the  classification  after  running  the  pipeline  on 
the  data,  where  all  classes  have  been  amalgamated  to  a  single  overall  classification 
percentage.  The  red  represents  very  strong  classification  while  blue  is  very  poor 
classification.  The  fact  that  the  red  clusters  above  the  frame-line  says  that  frames 
really  are  the  key  element  of  this  pipeline  and  the  addition  of  redundancy  to  the 
concept  of  endmember  is  a  valuable  one. 
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The  results  of  this  section  are  based  on  the  Smith  Island  HSI  data  described 


in  the  data  portion  of  this  chapter.  Here  these  are  graphically  displayed  in  Figures 
3.22,  3.23  and  3.24.  We  use  the  following  settings  in  our  pipeline:  Kernel:  LLE, 
Number  of  Neighbors:  50,  Number  of  Samples:  40000  pixels,  Number  of  reduced 
dimensions:  21,  Endmember  Algorithm:  SVDD,  Number  of  Endmembers:  69,  Type 
of  Coefficients:  Canonical,  Classifier:  Vector  Angle. 

3.4  Wavelet  Packets  and  Spectral/Spatial  Representation 

In  this  section  we  will  mirror  the  developments  in  [7]  to  give  a  short  introduc¬ 
tion  to  the  use  of  wavelet  packets  in  combining  spectral  and  spatial  representations 
for  dimension  reduction. 

Tools  for  the  analysis  of  hyperspectral  data  have  successfully  been  applied  to 
detect  and  classify  objects  in  areas  ranging  from  human  pathology  to  geophysics 
and  satellite  imaging.  Hyperspectral  image  analysis  relies  on  using  dimension  re¬ 
duction,  because  spectral  signatures  and  their  mixtures  are  supposed  to  lie  on  low¬ 
dimensional  manifolds.  Principal  component  analysis  (PCA)  is  a  common  dimension 
reduction  method,  [48].  It  is  a  linear  transform  that  determines  the  ‘directions’  in 
the  data  by  maximizing  the  captured  variance.  If  the  manifold  is  linear,  then  PCA 
is  the  optimal  choice. 

Dimension  reduction  methods  and  classification  in  hyperspectral  imaging  use 
only  spectral  information.  However,  the  images  also  have  spatial  contents  that  are 
useful.  Wavelets  and  their  multilevel  decomposition  are  well-suited  for  the  analysis 
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Figure  3.22: 


Frame  Coefficient  Maps  of  Smith  Island 
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Figure  3.23:  Smith  Trial  Ground  Truth  Results. 
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Figure  3.24:  Smith  Competing  Ground  Truth  Results  (total  only). 


# 

#c 

% 

#fp 

#fn 

Raw  data 

2743 

1957 

71% 

786 

786 

SVDD 

2743 

2088 

76% 

655 

655 

LLE  (d  =  19,...,  25) 

2743 

2059 

75% 

684 

684 

of  spatial  characteristics.  We  shall  combine  wavelet  analysis  with  PGA  to  capture 
spatial  and  spectral  data  distributions.  The  naive  approach  is  the  sequential  wavelet 
decomposition  of  each  spectral  band,  forming  an  hyperspectral  dataset  of  coefficients 
in  the  wavelet  domain,  and  performing  PCA  on  this  coefficient  hypercube.  This, 
however,  does  not  ‘show’  the  wavelets  that  they  are  in  fact  decomposing  many 
images  from  the  same  scenery.  While  decomposing  band-wise,  one  must  find  a  way 
to  incorporate  spatial  information  from  other  spectra. 
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The  Discrete  Wavelet  Transform  (DWT)  is  an  iterative  scheme  that  splits 
the  signal  into  approximation  and  detail  coefficients.  Each  level  is  computed  by 
passing  through  only  the  previous  approximation  coefficients.  The  Wavelet  Packet 
Transform  (WPT),  however,  decomposes  both  the  approximation  and  the  detail 
coefficients.  Contrary  to  the  DWT,  this  yields  a  full  wavelet  tree  decomposition 
that  offers  flexibility  in  the  choice  of  reconstruction  coefficients.  The  best  bases 
algorithm  by  Coifman  and  Wickerhauser  finds  the  best  subtree  for  reconstruction, 
i.e. ,  the  best  coefficient  set  according  to  an  entropy  measure,  [19].  See  Figure  3.25 
for  a  visualization  of  the  subtree  concept. 

Figure  3.25:  The  Entropy  Determines  the  Subtree  that  is  Chosen  for  Reconstruction. 


(c)  possible  wavelet  packet  subtree 


Wavelet  Packets  and  the  entropy  concept  offer  the  flexibility  to  transfer  spatial 
information  from  one  band  to  the  other.  We  separately  decompose  each  band  by 
using  the  WPT,  but  we  consider  a  common  entropy  over  all  bands.  The  common  en- 
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tropy  still  respects  the  correlation  between  different  frequency  bands  in  the  wavelet 
domain.  The  best  ’common’  subtree  is  determined  by  evaluating  the  common  en¬ 
tropy.  Each  node  of  the  common  subtree  then  is  a  set  of  coefficient  vectors  whose 
entries  refer  to  different  spectral  bands.  PCA  is  applied  to  these  vectors  and  we  then 
reconstruct  by  applying  the  inverse  WPT.  Fusing  spectral  and  spatial  information 
significantly  improves  the  output  of  classification  schemes. 

3.4.1  Wavelet  Packets 

This  subsection  will  contain  a  short  exposition  about  wavelet  packets  and  how 
they  will  be  utilized  in  the  scheme  described  in  the  above  section.  To  simplify  the 
notation  in  this  section,  we  will  assume  that  instead  of  a  traditional  data  cube, 
i.e. ,  3  dimensional  matrix  containing  D— spectral  slices,  we  suppose  our  ‘cube’,  A", 
is  a  D  x  M  matrix,  containing  d—  linear  spectra.  We  can  then  define  our  ‘pixels’, 
{xn}n=\  C  MD,  as  columns  of  the  matrix  X. 

Now  choose  an  orthogonal  wavelet,  with  mother  wavelet  0  and  father  wavelet 
■0  with  associated  scaling  filter  h  and  wavelet  filter  g.  Typically,  the  wavelet  decom¬ 
position  of  a  vector  is  done  in  the  following  way.  Let  the  initial  data,  the  vector, 
x  =  C0.  We  define  the  approximation  operator  H  and  the  detail  operator  G  acting 
on  a  vector  C  by 


(HC){k]=J2  C[n]h(n  —  2k), 

n 

(GC)  [k\  =  Y,C[n]g{n-  2k). 
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In  this  case  we  can  compute  the  wavelet  transform  of  the  vector  x  =  C0  by  cal¬ 
culating  the  values  of  {Cn},  the  approximation  coefficients,  and  {Dn},  the  detail 
coefficients  at  the  n’th  scale  by  following  the  schematic: 


Figure  3.26:  The  Wavelet  Coefficient /Filter  Tree. 
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There  is  nothing  keeping  us  from  completing  the  tree,  by  applying  H  and  G 
to  each  new  coefficient.  This  is  precisely  the  idea  behind  wavelet  packets,  and  the 
use  of  the  full  tree  produces  a  technique  that  is  rich  in  complexities.  In  the  wavelet 
decomposition,  calculating  the  coefficients  by  applying  the  operators  G  and  H  can 
be  both  tedious  and  not  often  revealing  about  the  coefficients  themselves.  Note  that 
if  we  call  w°  =  4>  and  w1  =  ^  that  we  can  write  some  of  the  coefficients  from  the 


C1[k}=(HC0)\k]^{x,w\k), 
D1[k]  =  (GCc)[k}  =  {x,w1_hk), 
C2[k]  =  (HCl)[k]  =  (x,w°_%k), 
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above  picture  as 


Di[k}  =  (GC1)[k]  =  (x,w'_%k). 


The  point  of  writing  this,  is  to  ask  ourselves  how  we  can  calculate  coefficients 
in  the  wavelet  packet  sense,  in  a  manner  analogous  to  this.  Namely,  if  we  continue 
our  tree  and  fill  it  out,  as  in  the  following  picture 

Figure  3.27:  The  Wavelet  Packet  Coefficient /Filter  Tree. 


where  now, 

Ci  =  cl 
Di  =  C\, 

D2  =  cl 

how  can  we  calculate  C?2  or  cl  in  as  clean  and  efficient  a  way  as  the  above 
calculations? 

Definition  4.  Let  w°  —  f>  and  w 1  =  -0.  We  can  then  define  the  sequence  {wn(x)}n£z+ 
of  wavelet  packet  functions  recursively  by 
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W2n{x)  =  J2hik]w  l,k(X)f 

k 

w2n+\x )  =  ^2g[k\wlk(x). 

k 

In  this  way,  we  can  define  the  wavelet  packet  transform  of  x  as  {C3k[n\  = 
(x,uj-k,n)}- 

3.4.2  Best  Basis  Algorithm 

Clearly  the  wavelet  packet  transform  of  x  is  quite  overcomplete.  We  wish  to 
reduce  the  total  information  contained  in  the  transform  by  eliminating  nodes  in  the 
tree  that  are  both  redundant,  and  in  some  way  add  very  little  to  the  reconstruction. 
This  is  called  the  best  basis  algorithm  and  is  a  major  component  for  compression 
of  data.  To  select  a  best  basis,  we  require  some  criteria  for  how  to  do  so,  and  so  we 
arrive  at  the  so-called  entropy  (cost)  function. 

Definition  5.  The  entropy  function,  E,  maps  vectors  (of  any  length)  into  the  real 
numbers  and  has  the  following  properties: 

E(0)  =  0, 

£([vi  v2])  =  E(Vl)  +  E(v2), 

where  [vi  v2]  is  the  concatenation  of  the  vectors  Vi  and  v2 . 

Examples: 
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1.  A  very  simple  entropy  function  is  the  threshold  function  with  the  threshold 
value  a, 

E«(v)  = 

j 

where  c[j\  =  1  if  |v[j]|  >  a  and  c[j\  =  0  otherwise. 

2.  A  more  useful  entropy  function,  from  an  information  theory  viewpoint,  is  the 
Shannon  entropy  function, 

^Shannon  (v)  =  "  ^  ln  KbD  ■ 

j 

We  will  use  an  example  to  illustrate  the  mechanics  of  the  best  basis  algorithm. 
To  make  matters  simpler,  we  will  use  the  threshold  entropy  function  with  threshold 
value  a  =  1.1.  Our  wavelet  packet  transform  produces  the  following  coefficients,  in 
Figure  3.28. 

Figure  3.28:  An  Example  of  a  Wavelet  Packet  Coefficient  Tree. 
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Calculating  the  threshold  entropy  function  with  value  a  =  1.1  is  easy,  since  we 
must  only  count  the  number  of  entries  in  each  node  whose  absolute  value  is  greater 
than  a.  The  first  node  is  therefore  assigned  the  value  4  and  the  entire  entropy  can 
be  calculated  as  in  Figure  3.29. 

Figure  3.29:  An  Example  of  a  Corresponding  Entropy  Function. 

4 

R 

da 


The  algorithm  then  starts  from  the  bottom,  or  leaf  nodes,  and  simply  applies 
the  value  of  the  entropy  function  to  those  nodes,  and  considers  all  of  them  to  be 
part  of  the  best  basis.  The  next  step  looks  at  the  parent  nodes  to  those  leaves  and 
asks  the  question:  is  the  parent  node’s  entropy  value  smaller  than  or  equal  to  the 
sum  of  the  offspring  entropy  functions?  If  the  answer  to  this  question  is  affirmative, 
then  we  mark  the  parent  node  as  a  member  of  the  best  basis,  and  forget  its  children. 
If  the  answer  to  the  question  is  no,  then  we  replace  the  entropy  function  value  of 
the  parent  node  with  the  sum  of  the  entropy  function  values  of  its  children.  In  this 
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way  we  can  work  out  way  up  the  tree  and  at  the  end  have  marked  our  best  basis 
with  respect  to  that  particular  entropy  function.  In  our  example  we  would  end  up 
with  Figure  3.30 

Figure  3.30:  The  Best  Basis  Algorithm. 


.5, -2.3 


[f]  [A]  0  0 

where  the  colored-in  squares  are  the  final  best  basis.  Just  for  edification,  the 
Shannon  entropy  function  would  have  given  the  values  in  Figure  3.31.  resulting  in 
exactly  the  same  best  basis  in  this  particular  case. 

3.4.3  Inverting  the  Wavelet  Packet  Transform 

The  inversion  of  the  wavelet  packet  transform  can  be  easily  (and  efficiently) 
computed  by  noting  the  relationship  of  the  operators  G  and  H,  and  in  particular, 
their  adjoints. 
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Figure  3.31:  The  Shannon  Entropy  Function. 


(H'c)  [k]  =  £  c[n\  h[k  —  2 n], 

n 

(< G*c )  [k]  =  c[n\  g[k  -  2 n\. 

n 

Since  we  have  the  properties  that 

HH*  =  GG*  =  H*H  +  G*G  =  /, 

we  can  recover  the  coefficient  C Jk  by  writing 

Ci  H  =  H'CliM  +  G‘C^f[k]. 

3.4.4  Combining  LLE  and  Wavelet  Packets 

We  wish  to  use  the  properties  of  wavelet  packets  to  massage  our  data  cube 
into  a  sparse  and  more  natural  source  of  data  for  LLE  to  operate  upon.  The  plan 
roughly  follows  the  following  schematic  diagram. 


Ill 


Figure  3.32:  Spectral/Spatial  Dimension  Reduction  Schematic. 


This  will  a  short  guide  to  this  process,  just  keep  in  mind  that  the  notation  will 


still  be  within  onr  simpler  2-dimensional  framework. 


1:  Begin  with  a  hyperspectral  data  cube,  X ,  that  is  D  x  M,  where  D  is  taken  as 
quite  large.  We  then  slice  the  cube  into  individual  layers,  which  we  will  call 
xn.  Each  xn  lives  in  and  there  are  D  of  them. 

2:  We  choose  an  orthonormal  wavelet  system  with  associated  mother  wavelet  <f), 
father  wavelet  ^  and  assign  their  filters  to  be  h  and  g.  We  assign  w°  =  (j) 
and  w1  =  and  calculate  the  remaining  associated  wavelet  packet  functions, 
{wm},  by  the  recursive  formulas 

w2m(x)  =  '%2h[k]w?tk{x), 

k 

w2m+1(x)  =  ^2g[k\w^k(x). 

k 

For  each  slice,  xn,  we  calculate  the  wavelet  packet  transform  independently, 
which  we  will  denote  as  {C^k},  by  the  formula 

Cn,kH  =  (Xn,W-k,m)- 
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3:  We  then  specify  an  entropy  function,  E,  and  compute  each  of  the  values 
E(CJnk)  for  each  node  in  every  tree  representing  an  individual  slice.  We  then 
average  the  entropy  functions  together,  thereby  tying  each  slice  to  every  other 
one  when  it  comes  time  to  find  a  best  basis.  We  write  this  as 

E(4)  =  5  £  E(°U) 

n=  1 

where  we  understand  that  AJk  represents  a  position  in  any  of  the  individual 
trees.  Next  we  replace  the  assigned  entropy  values  in  each  tree  with  the  average 
version  calculated  from  E.  That  is,  every  node  that  is  in  the  same  position, 
will  have  precisely  the  same  entropy  value,  across  every  tree.  We  then  apply 
the  best  basis  algorithm  to  only  one  of  the  trees  (since  they  share  the  same 
entropy  values,  the  result  will  be  the  same  for  every  tree). 

4.  The  next  step  involves  thresholding  the  best  bases  that  we  have  just  generated. 
This  will  add  sparsity  once  we  have  recreated  our  cube. 

5.  Now  we  wish  to  take  our  threshed  best  bases  and  force  them  to  look  like  a 
data  cube  that  LLE  can  act  upon.  One  way  to  do  this  is  to  assign  nodes 
earlier  precedence  based  on  a  lexicographic  like  ordering.  That  is,  we  have  the 
best  basis  represented  by  the  collection  { A3k }  where  k  G  ff  and  j  G  O,  where 
these  are  the  appropriate  index  sets  for  the  best  basis,  then  we  say  A3k  has 
precedence  over  A3k  if  k\  <  or  if  k\  =  /c2  and  j\  <  ji2.  This  gives  a  unique 
ordering,  and  allows  us  to  create  the  new  vectors  yn  as  the  concatenation  of 
the  threshed  coefficients  that  are  in  the  best  basis,  in  order  of  precedence.  We 
then  think  of  each  yn  as  a  row  in  the  matrix  Y,  which  has  D  rows. 
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6.  With  our  new  data  cube  in  hand,  we  assign  ’pixel’  vectors  to  be  the  columns 
of  Y,  establish  neighbors  and  neighbor  weights  and  then  create  an  LLE  ker¬ 
nel.  We  then  extract  the  d  most  important  eigenvectors  and  have  effectively 
reduced  our  wavelet  packet  cube’s  dimension  to  d. 

7.  We  retain  the  concept  that  each  row  in  the  LLE  reduced  version  of  Y  still 
represents  a  wavelet  packet  best  basis,  and  thus  each  can  be  reconstructed. 
We  take  the  rows,  yn,  for  which  there  are  d,  and  recalling  the  ordering  we  used 
in  step  5,  and  reverse  it,  forming  true  wavelet  packet  trees.  With  this  in  mind, 
we  can  invert  the  wavelet  packet  transform  using  the  formula 

cym]  =  ITC&hM  +  G’C'i+iN. 

for  each  rn  —  1, ...,  d. 

What  remains  after  inverting  the  wavelet  packet  transform  is  the  desired  result;  a 
d-dimensional  representation  of  the  original  D-dimensional  data  cnbe. 

3.4.5  Numerical  Results 

Given  a  full  wavelet  tree,  let  T  denote  the  collection  of  all  subtrees.  The 
entropy  E  (or  cost  function)  of  a  wavelet  packet  decomposition  is  a  nonnegative 
map  E  :  T  — >  R.  Given  n  spectral  bands,  we  now  decompose  each  band  by  the 
WPT.  This  gives  us  an  entropy  Et.  for  i  —  1,  ...,n,  for  each  band.  We  define  the 
common  entropy  Ec  through  the  weighted  £p  term 
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Ec  =  Y,wi\Ei\*  >  (3.8) 

i=l 

where  0  <  p  <  2  and  a  weight  vector  {wj}”=1.  Minimizing  the  entropy  yields 
the  best  subtree  for  reconstruction. 

The  assumption  that  the  underlying  manifold  in  the  wavelet  domain  is  linear 
might  not  hold.  It  is  much  more  likely  that  we  are  in  a  nonlinear  regime.  PCA  then 
cannot  capture  the  essential  information  by  itself.  There  are  two  ways  to  introduce 
nonlinearity.  We  could  replace  PCA  with  a  nonlinear  dimension  reduction  such  as 
Locally  Linear  Embedding  (LLE),  Laplacian  Eigenmaps  (LE),  Hessian  LLE  [28], 
and  Diffusion  Wavelets/Diffusion  maps  [18]  [14]  [15]  [17]  [16]  that  try  to  recover  the 
manifold.  However,  these  methods  are  computationally  expensive.  Our  simplistic 
approach  is  to  apply  (soft-)shrinkage  to  the  wavelet  coefficients  the  classical  concept 
for  compression  and  denoising.  In  the  context  of  hyperspectral  imagery,  however,  we 
use  the  neutral  formulation  ‘shrinkage  concentrates  spatial  and  spectral  information 
into  few  coefficients  in  the  wavelet  domain’.  Through  shrinkage,  the  output  of  the 
WPT  becomes  a  nonlinear  approximation  of  the  signal.  Moreover,  shrinkage  means 
smoothing  of  the  underlying  manifold.  This  is  a  step  towards  linearization.  Finally, 
we  apply  PCA  to  the  shrunken  coefficients  and  then  transform  back.  We  have 
applied  the  method  to  the  Urban  dataset.  Since  we  have  ground-truth,  we  can 
compare  our  proposed  method  with  standard  PCA  on  the  original  data  set  with 
respect  to  classification.  We  use  15  to  50  principal  components  that  capture  more 
than  99%  of  the  variance.  For  the  Figures  3.33  and  3.34,  we  have  chosen  the  Haar 
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wavelet,  with  p  =  1.3,  and  Wi  =  1,  for  each  i  =  1  Our  proposed  method 

outperforms  simple  PCA,  see  Figure  3.35. 


Figure  3.33:  Classification  Results  on  Urban  Data  Set. 


(a)  RGB  image  of  Urban  dataset  (b)  reconstruction  from  the  first  (c)  vector  angle  classification  us- 

PC A  component  of  the  wavelet  do-  ing  15  PCA  components  from 
main  shrinked  wavelet  packet  coeffi¬ 

cients 
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Figure  3.34:  Selection  of  Classes  of  the  Vector  Angle  Classification  Scheme  for  the 


Urban  Data  Set. 


Figure  3.35:  The  Combination  of  WPT  with  PCA  Outperforms  Simple  PCA. 
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Chapter  4 


The  Multiplicative  Zak  Transform 
4.1  Introduction 

In  1946,  D.  Gabor  [34]  and  the  systems  bearing  his  name  facilitated  the  re¬ 
discovery  of  a  useful  transform  first  discovered  long  ago  and  utilized  by  Weil  [55], 
now  referred  to  as  the  Weil-Brezin,  Zak  or  kq-transforms.  Taking  our  cue  from  Fol- 
land  [31],  “the  transform  is  referred  to  in  the  literature  as  the  Weil-Brezin  transform 
(the  name  we  shall  adopt)  or  the  Zak  transform.” 

The  Zak  transform  demonstrates  its  remarkable  flexibility  in  the  sheer  number 
of  applications  to  which  it  is  applied.  These  applications  include  solid  state  physics, 
signal  processing,  and  noncommutative  algebra.  Within  the  realm  of  time-frequency 
analysis,  the  Zak  transform  aids  in  the  characterization  and  creation  of  new  families 
of  Gabor  systems  and  is  integral  in  proving  the  most  general  form  of  the  Balian-Low 
obstruction  theorem. 

The  underlying  group  structure  of  Gabor  systems  is  Hr,  the  real  Heisenberg 
group.  This  structure  is  clearly  represented  in  the  summands  of  the  Zak  transform 
as  the  Schrodinger  representations  of  the  Heisenberg  group  [32],  A  corresponding 
transformation  for  wavelet  systems  would  certainly  depend  on  the  representations 
of  the  affine  group.  On  the  surface,  the  difference  in  the  action  of  the  Heisenberg 
group  and  the  affine  group  on  L2(M)  may  appear  superficial,  however,  due  to  the 
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non-transitivity  and  the  dearth  of  subgroups  of  the  affine  group,  adapting  the  Zak 
transform  directly  for  wavelet  systems  is  likely  to  fail. 

Auslander,  Eichmann,  Gertner,  and  Tolimieri  defined  a  multiplicative  Zak 
transform  [1],  mimicking  the  construction  of  the  Gabor  Zak  transform  by  taking 
into  account  the  quirks  of  the  affine  group.  This  limits  their  analysis  in  the  subse¬ 
quent  paper  [35]  to  a  subspace  on  which  the  action  is  transitive,  namely  the  Hardy 
space  L2(( 0,  oo)).  The  thrust  of  this  work  was  the  characterization  and  analysis  of 
wavelet  systems  that  form  frames  for  their  spans,  where  the  Fourier  transform  of  the 
generating  function  is  supported  on  the  positive  real  line.  Such  wavelets  cannot  span 
all  of  L2(1R)  since  they  do  not  represent  the  negative  frequencies.  Specifically,  they 
show  that  analyzing  the  wavelet  collections  described  above  with  the  multiplicative 
Zak  transform  reduces  to  checking  the  bounds  of  the  function 

ieJ 

almost  everywhere  on  [1,2),  where  the  index  set  J  has  finite  cardinality. 

Segrnan  and  Schempp  [50]  constructed  an  alternate  form  of  the  multiplica¬ 
tive  Zak  transform.  Their  results  allow  the  characterization  of  wavelet-like  frames 
through  the  values  of  the  transform  on  a  compact  subset  of  M2.  The  reason  that  we 
do  not  use  their  transform  is  because  their  goal  was  the  incorporation  of  scale  into 
the  Heisenberg  group  and  not  necessarily  the  characterization  of  wavelet  frames. 
Because  of  this,  their  analyzed  wavelet  frames  do  not  correspond  to  those  generated 
by  representations  of  the  affine  group. 

In  this  chapter  we  shall  extend  the  transform  constructed  by  Gertner  et  ah, 
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weaken  the  condition  requiring  support  within  (0,oo),  allowing  limited  support  in 
R  and  study  possible  methods  to  remove  the  requirement  of  compact  support  of  the 
Fourier  transform  of  the  generating  function.  Through  the  use  of  superframes  [3], 
we  will  demonstrate  a  possible  framework  within  which  wavelet  frames  on  the  real 
line  could  be  characterized  by  the  multiplicative  Zak  transform.  Finally,  we  will 
introduce  a  simple  inversion  formula  for  the  multiplicative  Zak  transform. 

4.2  Notation 

We  will  be  discussing  both  Gabor  and  wavelet  systems.  Gabor  systems  will 
always  be  generated  by  the  letter  g.  All  other  Latin  letters  denote  functions  acting  on 
the  time  domain.  Greek  letters  will  denote  functions  acting  on  the  Fourier  domain 
with  the  exception  that  b  will  be  considered  as  acting  in  the  time  domain  as  a 
wavelet  generating  function.  Occasionally  there  will  be  objects  that  live  on  (0,  oo), 
in  each  of  these  cases  we  mean  that  (0,  oo)  C  M.  The  capital  Greek  letters  <t>  and 
T  will  always  denote  dilation  periodic  functions  acting  on  a  periodization  of  the 
Fourier  domain.  Often  these  functions  will  be  restricted  to  a  specific  set,  on  which 
they  are  uniquely  defined.  The  context  will  make  this  distinction  clear. 

4.3  Gabor  systems  and  the  Zak  transform 

Before  defining  the  multiplicative  Zak  transform,  it  will  be  instructive  to  briefly 
review  Gabor  systems  and  the  Gabor  Zak  transform.  Our  construction  and  use  of 
the  multiplicative  Zak  transform  on  wavelet  collections  will  mirror  that  of  the  Gabor 
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situation. 


Definition  6.  Let  a,  b  >  0  and  let  g  G  L2(R).  We  call  the  collection 

(g,  a,  b)  =  {gm,n (t)  =  e2mbmtg(t  -  an)  |  m,  n  G  Z} 

the  uniform  Gabor  system  generated  by  g  with  translation  parameters  a  and  b. 

Gabor  originally  used  a  Gaussian  in  the  place  of  g  with  parameters  a  —  b  —  1. 
It  turns  out  that  this  particular  collection  is  complete  in  L2(M)  but  does  not  form 
a  frame.  We  will  only  consider  the  case  when  a  —  b  —  1. 

Definition  7.  For  f  G  L2(M)  define  the  Zak  transform  to  be 

Z f(t,  OJ)  =  -  Ue2'““.  for  it,io)  6  [0,  l)2. 

4.3.1  Properties  of  the  Zak  transform 

Let  Q  =  [0,  l)2  and  consider  the  two  orthonormal  bases  of  L2(R)  and  L2(Q), 
respectively: 

(l[Wi)(t)  •  and  (e“  •  e2™™}. 

Proposition  1.  Z  :  L2(]R)  — >  L2(Q)  is  a  unitary  isomorphism. 

Proof:  This  is  clear  from  the  mapping 

ry  .  -n  ( ,\  fil'Kint  ,  .  _! 2ivimt  „2ninoj 

Z  ■  •  e  i->  e  •  e 

□ 

Corollary  1.  {/!}  is  a  frame  for  L2(JR ')  with  frame  bounds  A,  B  >  0  if  and  only  if 
{( Zfk )}  is  a  frame  for  L2(Q )  with  frame  bounds  A,  B  >  0. 
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Note  that  for  a  particular  gm,n{t)  =  e2mmtg(t  —  n ),  the  Zak  transform  ’pulls’ 


the  modulation  and  translation  off  of  g  in  the  following  way: 


(Zgm,n)  (t,  u>)  =  ]£  9m, nit  -  k)e2viku  =  YJ9{t-k- 

k  k 


9(t~k')e 


2nim(t—k' -\-n) ^27ri(k' _  ^2nimt^— 2irinuj 


( Zg )  ( t,u ) 


k'=n-\-k 


This  allows  us  to  study  the  properties  of  the  system  based  solely  on  the  values  of 
the  Zak  transform  of  g.  Putting  these  two  important  facts  together  leads  us  to  a 
theorem  illustrating  the  utility  of  the  Zak  transform. 


Theorem  1.  The  Gabor  system  generated  by  g  G  L2(M)  is  a  frame  for  L2(R)  with 
frame  bounds  A,  B  >  0  if  and  only  if 


A  <  \{Zg)  (t,u>)\2  <  B 


for  almost  every  ( t,iv )  G  Q. 

Proof:  This  proof  is  due  to  Heil  and  Walnut  [38].  Suppose  that  {gm,n}  is  a  frame 
for  L2(M)  with  frame  bounds  A,  B  >  0.  Then  we  have  for  each  /  e  L2(M)  that 


A 


L2(  R)  — 


<  \(f,gm,n)\2  <  B 


2 

L2(R) 


m,n 


Since  the  Zak  transform  is  an  isometry  we  know  that 


L2(R)  —  II^/IIl2(Q)  ' 


Furthermore,  we  can  use  the  unitary  property  of  the  Zak  transform  to  write 


^|</,<?m,n>|2  =  E  C Zf’(Zgm,n )) 


L2(Q) 


E 


^27 rimt  g — 2n  inu  ^  g  ^ 


L2(Q) 


m,n 


m,n 


m,n 
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=  E  | (Zf  ■  Zg,e^‘e-^)L2{Q) \  =  || Zf-  Zg\\l,(Q)  (4.1) 

m,n 

since  the  right  part  of  the  inner  product  in  equation  4.1  is  an  orthonormal  basis  for 
L2(Q)  and  applying  Parseval’s  theorem.  Suppose  that 

\Zg\2<A 

almost  everywhere  on  a  set  C  Q  of  positive  measure.  Since  Z  is  a  surjective 
isometry,  we  can  find  an  /  e  L2  (M)  for  which  Zf  =  F  =  1  on  fl  and  zero  everywhere 
else.  Then  we  would  have 

A  m(Q)  —  A  ||F||^2(q)  <  J  J  \FZg\f  dm  =  j  J  \Zg\2  dm  <  A  m(Q), 

clearly  a  contradiction.  A  similar  contradiction  arises  if  you  assume  that  \Zg\2  >  B 
almost  everywhere  on  a  set  of  positive  measure.  Thus  we  have  the  forward  direction. 

Now  suppose  that  A  <  \Zg\2  <  B  almost  everywhere  on  Q.  We  easily  prove 
this  direction  by  noting  that  for  /  G  L2(M) 

-411/nh.)  =  A\\zf\\i,m  <  \\zm\\2LHQ) 

=  Et/.*»,»)i2 

m,n 

=  \\Zm\\%iQ)  <B\\Zf\\lHQ}  =  B\\f\\lHm. 

□ 

This  result  is  a  striking  example  of  the  power  the  Zak  transform  provides  when 
analyzing  Gabor  systems.  A  short  list  of  results  obtained  similarly  from  Theorem  1 
and  can  be  summarized  in  the  following  theorem. 
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Theorem  2.  Let  g  E  L2(R). 


A.  The  Gabor  system  generated  by  g  is  complete  in  T2(R)  if  and  only  if  Zg  ^  0 
a.e.  on  Q. 

B.  The  Gabor  system  generated  by  g  is  an  orthoriormal  basis  of  L2  (R)  if  and  only 
if  \Zg\  =  1  a.e.  on  Q. 

Even  though  we  have  defined  the  Zak  transform  onto  L2(Q),  there  is  a  lot  of 
analytical  information  that  we  can  gain  by  extending  the  transform  to  be  defined  on 
all  of  R2.  We  can  achieve  this  by  using  the  pseudo-periodicity  of  the  Zak  transform. 
Namely,  for  any  g  E  T2(R), 


Zg(t  +  l,u)  =  ^2g{t  +  l-  k)e27riku  =  e2ni“  ^  g(t  -  k')e2*ik'u 

k'=k—  lEZ 

=  e2™Zg(t,cu), 

Zg{t,uj  +  1)  =  ^g(t-  k)e2mk{u+1)  =  e2nik^g(t  -  k)e2*iku}  =  Zg(t,u). 

k&L  k£Z 

The  Zak  transform  is  1-periodic  by  translation  in  the  variable  u,  and  is  a 
character  away  from  1-periodicity  by  translation  in  the  t  variable.  Using  these 
relations,  we  can  easily  extend  the  Zak  transform  to  all  of  R2.  An  important  result 
concerning  this  extension  is  the  following  famous  theorem 

Theorem  3.  Suppose  g  E  L2(R)  such  that  Zg  is  continuous  on  all  of  R2.  Then  Zg 
has  at  least  one  zero. 
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In  this  critical  sampling  case  of  Gabor  systems,  with  the  Zak  transform  ex¬ 
tended  and  defined  as  above,  and  with  this  seemingly  innocuous  result  about  the 
continuity  of  an  extended  Zak  transform  forced  to  take  a  zero,  we  arrive  at  the 
important  Balian-Low  obstruction  theorem. 

Theorem  4.  Suppose  that  g  e  L2(IR)  and  the  Gabor  system  generated  by  g  is  a 
frame.  Then  either  xg  ^  L2(IR)  or  7 g  ^  L2(!R). 

4.4  The  Multiplicative  Zak  transform 

In  this  section  we  shall  introduce  the  multiplicative  Zak  transform  created  by 
Gertner  and  Tolimieri  [35].  We  shall  modify  the  definition  so  that  we  have  more 
flexibility  in  the  types  of  functions  we  can  analyze.  It  should  be  noted  that  the  group 
structures  underlying  Gabor  systems  and  wavelets  are  vastly  different.  Both  are  non- 
commutative,  but  the  similarities  essentially  stop  there.  The  Heisenberg  group  has 
many  lattice  sub-groups  while  the  2-dilation  affine  group  essentially  has  very  few. 
The  Heisenberg  group’s  representations  over  L2(M)  (Schrodinger  representations) 
have  a  beautiful  duality  through  the  Fourier  transform.  The  affine  group  shares 
no  such  property.  A  further  complication  to  extending  the  ideas  from  the  Gabor 
setting  to  wavelets  is  that  the  Heisenberg  group  acts  transitively  while  the  affine 
group  has  three  orbits,  taken  as  actions  over  L2(M).  These  differences  will  show  up 
many  times  as  obstacles  in  defining  a  Zak  transform  that  parallels  that  of  the  Gabor 
Zak  transform. 

A  few  papers  have  approached  either  the  outright  construction  of  a  Zak  trans- 


126 


form  for  wavelets  (Gertner,  Tolimieri  [35]  [1])  or  have  toyed  with  representations  that 
include  time  and  frequency  with  scale  (Segman,  Schempp  [50]).  We  will  consider 
the  Gertner  and  Tolimieri  paper  in  this  section. 

We  arc  ultimately  interested  in  the  properties  of  the  wavelet  system  generated 
by  the  function  ^  G  L2(M),  viz., 

A  =  { )  =  2  2",0(2mt  —  n)  for  m,  n  G  Z}. 

Instead  of  working  in  the  time  domain,  we  are  going  to  switch  to  the  frequency 
domain  for  the  remainder  of  this  chapter.  For  ease  of  writing,  6  will  represent  a 
generic  function  supported  in  R,  and  ^  is  a  specific  Fourier  transform  of  a  wavelet 
generating  function.  Often  i/j  =  6.  We  can  see  that  if  A  has  ‘nice’  properties  in 
L2(M),  then  that  will  also  be  true  of  the  collection 

A  =  {$ m,n(l )  =  e-27ri7n2-m2^(2"m7)}  for  m,n  G  Z 

in  L2(R). 

The  multiplicative  Zak  transform  developed  by  Gertner  and  Tolimieri  used 
dilation  periodicity  heavily  in  the  framing  of  their  arguments.  They  use  this  property 
and  a  property  about  fractional  dilations  being  periodic  under  certain  circumstances 
to  prove  results  about  frames  generated  by  ijj  that  are  supported  in  (0,  oo).  This 
line  of  reasoning  actually  gets  in  the  way  of  extending  the  transform  across  7  =  0. 
We  shall  try  a  slightly  different  approach,  one  that  allows  the  characterization  of 
functions  having  Fourier  transforms  with  more  arbitrary  support.  The  concept  of 
dilation  periodicity  is  still  useful  in  constructing  the  transform  as  initially  done. 
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Definition  8.  We  say  that  a  locally  square  integrable  function  $  G  Lic(( °>°o))  is 
dilation-periodic  (DP)  if 

2 2  $(27)  =  $(7)  for  7  >  0. 

Note:  If  $  is  DP  then  it  can  be  completely  determined  for  7  >  0  by  its  values  on 
[1,  2)  by  the  formula 

$(7)  =  2-i$(2~fc7)  for  2k  <  7  <  2k+1. 

Here  we  could  replace  the  set  [1,  2)  by  E  where  {2 kE}  partitions  (0,  00). 

NB:  We  can  identify  a  DP  function  as  the  periodic  extension  of  a  function  defined 
strictly  on  the  sets  [1,  2),  or  more  generally  E.  The  context  will  make  domain  of  the 
function  in  question  clear. 

Definition  9.  Let 

L2dp« 0,oo))  =  {4  e  -LL((0,°o))  I  4  is  DP,  j  |4|2  <  oo}. 

This  is  a  Hilbert  space  with  the  expected  inner  product.  It  is  useful  to  think  of 
the  spaces  L2([l,  2))  and  L2op  as  being  the  same.  We  will  do  most  of  our  analysis  in 
L2(M),  but  occasionally  will  work  in  L2DP.  To  make  this  convention  clear,  we  will  use 
upper-case  greek  letters  to  specify  when  an  object  is  considered  part  of  L2([l,2)), 
and  will  use  a  tilde  on  such  a  letter  to  show  it  is  a  member  of  L2DP (technically 
defined  on  all  of  (0,  00),  but  determined  on  [1,2)).  The  only  distinction  being  that 
to  retrieve  T  G  L2DP  from  T  G  L2([l,2))  requires  merely  ’periodizing’  T  in  the 
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following  manner 


V(t)  =  2"t T(2"mf)  when  t  e  [2m,2m+1). 
Definition  10.  We  will  use  the  following  ONBs  of  L2([  1,2)) 

=  e2nim2~kt  for 

$n(w)  =  Jl0S^  e27rinlog2(w)  for  n  e  Z. 
Both  ONBs  obey  the  rule 

^o(w)  •  ^m+kM  =  ^m(w)  •  ^fc(w), 

$oM  •  <hm+fc(u;)  =  <hm(u;)  •  $k(u). 

An  additional  property  is  that 

|Tfc|  —  | fi/ 1 1  for  all  k. 


Note  that 


^m(t)  = 


te  [1,2) 


2-|e27rim2  kt  te[2k,2k+1) 


is  the  periodized  version  of  'I'm.  So  {'hm}  and  {$«}  are  ONBs  of  L2DP.  The  defi¬ 
nitions  of  {T.m}  and  {‘hn}  are  meant  to  mimic  the  behavior  of  the  basis  functions 
from  the  Gabor  Zak  case,  namely  {e2mnt}  and  {e2mmuJ}.  Note  that  the  logarithm  is 


meant  to  treat  dilations  much  as  the  Gabor  functions  treated  translations. 


The  following  is  an  orthonormal  basis  for  L2((0,oo))  that  comes  up  often 
in  analyzing  the  multiplicative  Zak  transform,  ft  is  essentially  half  of  a  Shannon 
wavelet  in  the  Fourier  domain. 
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Definition  11.  Let  gm^n{ 7)  =  2“?  l[2m;2m+i)(7)e27rm2  mT. 

Proposition  2.  is  an  CWL>  o/L2((0,  00)). 

Proof:  Let  m,n,k,l  G  Z.  Then 

{gm,n,gk,l)L 2((0>oo))  =  f  %  2  ( 1  (t)  ■  l[2fc,2fc+1)  (^))  '  ^e27"^’2  i2  ^  dt 

0 


(m+fc) 


2-“l[2m>2m+i)n[2*i2fc+i)(i)  e2ni(n2~m-12  ^  dt  =  0 

Jo 

when  m  ^  k.  Suppose  that  m  =  k.  Then  we  have  that 


(gm,n,  gk,l) L 2((0)OO))  ~ ' 


ll[2m,2m+1) (t)  e 


27ri(n— 1)2 


dt 


=  2‘ 


>m+l 


;2^(n-Z)(2  ™i)  dt  =  /  e2«(n-I)t  dt 


J  2m  J  1 

Clearly  the  above  integral  is  zero  when  n  ^  l  and  is  unity  when  n 
{gm,n}  is  orthonormal. 


1.  Therefore 


It  is  well  known  that  the  collection  {e2mnt}nez  is  an  orthonormal  basis  of 
L2([l,2)).  Therefore  it  is  easy  to  see  that  for  each  fixed  m  G  Z,  the  collec¬ 
tion  {2~  2  1  pm  2m+!)  (t)  '  e2™2^}  nez  is  correspondingly  an  orthonormal  basis  of 
L2([2m,  2m+1)).  Thus,  since  elements  having  different  m' s  in  the  subscript  have  dis¬ 
joint  support,  the  collection  {gm,n}  is  an  orthonormal  basis  of  L2(Umez[2m,  2m+1))  = 
L2((0,oo)).  □ 
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Note:  For  all  n  E  Z 


^n(f)  ^  ^  9k,n  (kQ  • 

fcez 


Definition  12.  Lei  C[i,2)  =  [1,2)  x  [1,2).  For  9  G  L2((0,oo))  define  the  multiplica¬ 
tive  Zak  transform  Z*  :  L2(( 0,  cx)))  — >  L2(C[\p))  by 

(Z*9)  (t,u)  =  J2  ^9(2kt)<S>k{u). 
kez 

Restricting  the  values  of  the  multiplicative  Zak  transform  to  the  set  Cup)  serves 
the  same  purpose  as  restricting  the  Gabor  Zak  transform  to  the  set  Q.  However, 
unlike  that  case,  there  end  up  being  many  different  sets  for  which  this  transform 
will  be  interesting.  Gertner  and  Tolimieri  work  only  with  the  set  Cup)  though,  and 
so  will  we  for  the  remainder  of  this  small  section. 

Proposition  3.  The  multiplicative  Zak  transform  is  a  surjective  isometry  from 
L2((0,oo))  to  L2(C{1,2)). 

Proof:  We  need  only  to  look  at  how  Z*  operates  on  the  basis  {gm,n}- 

( Z*gm,n )  (t,uj)  =  gm,n(2kt)$k(uj) 

fcez 

Since  t  is  restricted  to  [1,  2),  we  have  that  k  —  m  and  so 

(Z*gmtn)  (t,u)  =  l[i p)(t)e2mnt^m(uj)  =  Vn(t)$m(u). 

□ 
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Alternately,  we  can  define  the  multiplicative  Zak  transform  on  any  function 
through  its  action  on  the  basis  {//„,  „}.  For  example,  given  any  6  G  L2((0,  oo))  we 
can  write 

0{t)  =  E  9k, l)  9k, l 

k,l 

thereby  making  the  multiplicative  Zak  transform  on  9  as  follows 


(Z*0)  {t,u>) 


z* 


E  9k, l)  9k, l 

-  k,l 


(' t,w )  =  E  (°’9k,i) 

k,i 


The  multiplicative  Zak  transform  enjoys  the  following  properties,  as  a  direct  parallel 
to  the  Gabor  Zak  transform. 


Proposition  4.  PI: 


2^  (Z*9)  (2 t,u) 


®o(u) 


(Z*9) 


P2: 


25  (Z*9)  (t,  2u)  =  ( Z*9 )  (t,u), 


PS: 


I  \{Z*9)\  | 


T 2 

ldp 


11*1 


2 

L2([0,oo))  ' 


Proof:  PI:  2a  (Z*9)  (2t,u)  =  2a  2ad(2fc2f)<f>fc(a;).  Setting  l  —  k  +  1  we  see  that 
we  have  £,€Z2^(2'()<f>, -!<«,)  •  =  |fg 

P2:  2i  (Z’0)  (t,  2w)  =  230(2*«)23  .$*(2w)  =  £„ez  230(2 H)<S>k(uj)  =  Z'0(t,  w). 

P3:  ||Z*0||2  =  /2J;2  EM2i0(2‘()4tH  2 
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□ 


=  EfeezEmez2^2"1  fi  0{2kt)e{2mt)  dt  f?  $k(u)  ■  <S>m(u)  du 

Let 

«,.,»(*)  =  2-?o(2-"t)Mj. 

Note  that  with  the  T’s  above,  the  s  are  exactly  the  elements  of  A.  Then  the 
fourth,  and  most  important  property  of  the  multiplicative  Zak  transform  is 


P4: 


(Z*0m,n)  (t,  U) 


®m(u)  _  fln(t) 
$0(o;)  '  T0(t) 


(z*0) 


Unfortunately  P4  is  not  always  true.  In  fact,  this  is  one  of  the  strongest 
obstacles  to  utilizing  the  multiplicative  Zak  transform  to  analyze  wavelet  sets.  This 
result  holds  true  for  any  6  G  L2(0,  oo)  for  which  supp{6 )  C  [2^,  2Z+1)  for  any  l  G 
Z.  The  twisting  action  of  the  multiplicative  Zak  transform  cannot  overcome  this 
obstruction  in  general  because  unlike  the  case  of  Gabor  systems,  the  parameters  do 
not  sit  on  a  regular  lattice. 

This  can  be  seen  clearly  from  the  definition  of  the  multiplicative  Zak  transform 
as  the  action  on  the  basis  {gm,n}-  Formally 
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=  e-2™2-72- f  2- 1 !  [2fc)2t+1)  (2— 7)e2^2-‘(2-™7) 


m-\-k 

=  2 — 5-1 


Qk+m  2^+m+l^ 


(7)e 


-27ri2-(m+fc)(n2fe-07 


9m+k,n2k—  l  • 


The  factors  representing  dilations  act  additively,  while  the  translation  parameter 
has  a  non-commutative  twist.  The  real  problem  is  that  each  scale  becomes  wrapped 
up  in  the  translation  parameters  (this  is  multi-resolution’s  forte)  and  they  cannot 
be  separated  unless  the  original  function  has  a  very  nice  support.  Putting  this  all 
together  looks  like 


fpm,n{ l)  =  (  X  (^’  9k’1)  9k’1  I  ^  =  X  (^’  9k’1)  (3M)m,n  (t) 
V  k,l  )  m>n  k,l 

^  1  9k, 9k+m,n2k-l('l)i 


and  thus 


Z*  (X, n)  (t,w)  =  X  ®k+m{u)yn2k-l(t). 

k,l 

Here  it  is  impossible  to  neatly  split  these  formal  Tn2fc-z  apart,  unless  the 
original  function  ^  has  support  in  some  [2s,  2S+1)  for  some  s  G  N  to  force  all  of  the 
dilations  to  be  fixed  and  allowing  the  sum  to  split. 

Gertner  and  Tolimieri  constrain  their  functions  to  having  support  in  the  set 
[1,  2n)  for  some  N  and  alter  the  dilations  of  their  collection  of  functions  to  be 
fractional  with  denominator  N.  This  allows  them  to  prove  the  following  classification 
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theorem,  but  at  the  cost  of  altering  the  collection  of  functions  from  being  a  true 
wavelet  collection. 

Theorem  5.  (Gertner,  Tolimieri)  Let  tf  G  L2(M)  satisfy  supp(xf)  C  [1,2^).  Then 
is  a  frame  for  the  closed  subspace  it  spans  if  and  only  if 

N—l  2 

3 A,  B  >  0  such  that  A  <  <  B ,  a.e.  t  e  [1,2). 

3=0 

4.4.1  Superframes  and  Extension 

Part  of  the  problem  with  the  approach  in  the  previous  section,  is  that  while 
analyzing  wavelet  collections  of  functions  restricted  to  (0,  oo)  is  interesting,  ideally 
we  want  to  be  able  to  characterize  wavelet  frames  for  all  of  L2(R).  This  is  simply 
not  possible  when  restricted  to  one  half  of  the  real  line.  This  small  section  will 
suggest  a  method  that  could  be  used  to  fuse  together  wavelets  frames  for  the  Hardy 
spaces  L2(— oo,0)  and  L2(0,  oo)  for  the  whole  space  L2(R)  =  L2(— oo,  0)©L2(0,  oo). 
We  shall  apply  a  neat  idea  called  superframes,  and  attempt  to  use  this  structure 
to  glue  the  pieces  together.  This  tool  was  developed  by  Balan  in  [3]  to  efficiently 
represent  multiplexed  signals. 

Definition  13.  Suppose  that  F{  =  {x\},  ...,FN  =  {x^f}  are  frames  for  the  Hilbert 
spaces  Hi, ...,  H ^  where  H  =  Hi©...©lEv.  We  call  the  collection  F  =  Fi©...©Fat  = 
{x\  +  ...  +  x%}  a  superframe  if  it  is  a  frame  for  the  larger  space  H . 

Theorem  6.  /.  The  superset  (T\, ... ,FN )  is  a  superframe  if  and  only  if  the  following 
two  conditions  hold  true: 
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a.  Fk  is  a  frame  for  Hk  for  1  <  k  <  N, 


b. 


E1®...®En  is  a  direct  sum  and  closed  subspace  of  I2(Z), 
{0}  and  J2k=i  is  closed. 


i.e.  Ekn^kEi)  — 


II.  Suppose  that  F  =  (Fi, ...,  Fm)  is  a  superframe.  Then  there  is  a  superframe 
(F\, ...,  Fn)  called  the  dual  superframe  of  F  such  that  the  following  reconstruction 
formula  holds  for  every  fi  G  Hi, fN  G  HN: 

=  fabO)  Xn  =  Y. 

nEZ  \  1=1  /  nEZ  \  1=1  / 

Moreover,  if  E\, Em  denote  the  coefficient  ranges  of  the  component  sets  of  F, 
then  F  is  a  dual  superframe  of  F  if  and  only  if  for  every  1  <  k  <  N,  Fk  is  a  dual 
of  Fk  and  Ek  is  orthogonal  to  every  Et  with  l  ^  k. 


A  very  natural  question  that  arises  in  conjunction  with  this  definition  and  the 
previous  section  is,  are  all  wavelet  frames  for  L2(R)  wavelet  superframes  for  the 
subspaces  L2(— oo,  0)  and  L2( 0,oo)?  The  following  theorem  shows  this  to  be  an 
affirmative. 


Theorem  7.  Let  G  L2(R)  such  that  {'f>rn,n}  is  a  wavelet  frame  for  L2(R).  Let 
=  ^(7) l(_OOj0) (7)  and  =  ^(7) l(o, 00) (7)  be  projections  onto  L2(-oo,0) 

and  L2(0,  00).  Then  {t/’mn}  a  wavelet  frame  for  L2(— oo,0)  and  {^)n}  is  a 
wavelet  frame  for  L2( 0,  cx))  and  their  sum  n  +  n  =  ^m,n}  is  a  superframe  for 
L2(  R). 

Proof:  Define 


Ei  =  {c  G  Z2)  |  cmn  =<  f ,  ifmn  >  for  some  /  G  L2((-oo,  0))}, 
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E-2  =  {c  G  t{I?)  I  cm,n  =<  >  for  some  /  e  £2((°,  °°))}. 


We  need  that  Ed  fl  E2  —  {0}  and  E\  +  E2  is  a  closed  subspace  of  £2(Z2). 


Let  c  £  Ei  fl  E2.  Then  we  have  that  for  some  g  G  E2((— oo,0))  and  /  G 
E2((0,  oo)),  for  each  m,  n  G  Z 


/  ^(®)^m,n(a:)  dx  =  {g^m)U)  =  Cmin  =  (/,-0m,n)  =  /  f  (x)^m,n(x)  dX. 

J — oo  J  0 

We  can  extend  /,  g  to  all  of  E2(R)  by  defining  them  to  be  zero  off  of  their  respective 
natural  domains.  We  then  have  that 


(g(x)  ~  f(x ))  V’m.nfc)  dx  =  0 


for  every  mgn  G  Z.  Since  is  a  wavelet  collection  in  E2(3R),  the  only  function 

that  is  orthogonal  to  all  of  these  is  the  zero  function.  Therefore  g  —  f  =  0.  Since 
g  and  /  have  disjoint  support,  this  forces  g  —  f  —  0  and  therefore,  c  =  0.  Thus 

Ei  n  E‘2  =  {o}. 


Now  we  want  to  see  if  the  sum  Ej  +  E2  is  a  closed  subspace  of  ^(Z2).  This  is 
in  general  not  true,  that  is,  there  are  closed  sets  A,  B  for  which  A  +  B  is  not  closed. 
For  example  consider  A  =  7L  and  B  =  \/2Z.  Then  the  sum  A  +  B  is  a  dense  subset 
of  R,  and  so  is  not  closed.  Now  we  will  look  at  the  specific  sum  E1  +  E 2.  Take  a 
sequence  { ck}k ^  C  Ei  +  E2  such  that  ck  — *  c  in  £2(Z2).  This  means  that  there  is 
a  collection  {fk}ke%2  C  E2((— cxd,0))  and  {gfc}fcez2  C  E2((0,oo))  such  that 


m,n  m,n 


=  (fk,^m,n)  +  (gk,^vi,n)  ■ 
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Given  e  >  0,  there  exists  a  K  G  N  such  that  for  all  k  >  K  we  have 


2 

£2(Z2) 


c. 


m,n 


/ 


A'  +  Qk i 


<  e. 


m,n  m,n 

Since  the  collection  {Zm,n}  is  a  wavelet  collection  in  L2(] R),  we  know  that  for  fixed 
k,  we  can  represent  fk  +  gk  by 


/*+/  =  £(/*+ s‘.  Gt) s~‘  (W") =  Z  (t,„) , 

m,n  m,n 

where  S  is  the  frame  operator  for  the  wavelet  collection.  Since  the  collection  {cfc} 
converges  in  the  £2(Z2)  sense,  we  have  that  the  collection  fk  +  gk  converges  in  the 
L2(R)  sense,  to  a  function  h  G  A2(]R).  We  must  show  that 


h  =  ^  c™vS  1  (Vw)  =  f  +  9, 


m,n 


for  some  /  G  A2((— oo,0))  and  g  G  A2((0,oo)).  If  this  is  true,  then  we  would  have 
that  cm>n  =  (h^m^  =  (. h +  {h+Am,n)  for  each  min  e  z- 

Let  e  >  0  be  given.  Since  gk  +  fk  — >  h  in  L2(R)  as  k  — »  oo,  there  exists  K\  G  N 
for  which  II  h  —  ( fk  +  gk )  1 1  <  |  for  all  k  >  K]_.  Similarly,  since  ck  — >  c  in  £2(Z2)  as 


k  — >  oo  we  have  that  there  exists  Ko  £  N  for  which  |  |cfc  —  c| 


\e2(z2)  <  2 1 


-,  for 


\l2(R) 


all  k  >  K‘2 ,  where  A  is  the  frame’s  lower  bound.  Let  k  >  K  =  maxjA'i,  A'2}.  Then 
we  have  that 


Cm,n$ 


-1 


m,n 


<  ||'!-(/‘  +  0,‘)||  + 


(Cm,»  -  5  1  (V>) 

m,n 
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+  •£ 


\  k  1 2  6 

|Cm,n  —  Cm,n|  A  ^  T 

„  C-l  AA  T3n4- 


Ip 

eA 

A 

2  ip 

Therefore  we  must  have  h  =  V /  '  means  that  since  h  = 

/  +  g  =  hl(_oo,o)  +  ^l(0,oo)  where  /  G  L2((— oo,  0))  and  g  G  L2((0,  oo)),  then  we  have 
shown  that  ck  — >  c  =  a  +  b  where  a  G  A i  and  b  G  EA  Therefore  E\  +  E2  is  a  closed 
set.  □ 


This  result  generates  a  lot  of  hope  for  being  able  to  analyze  wavelets  with  the 
multiplicative  Zak  transform.  The  schematic  for  doing  so  appears  to  be  a  straight¬ 
forward  process.  Start  with  some  ip  G  L2(R). 


1.  Analyze  ip  ,  the  projection  of  ip  onto  L2(— oo,  0),  with  a  reflected  version  of  the 
multiplicative  Zak  transform  and  determine  if  it  is  a  frame  for  that  subspace. 

2.  Analyze  ip+,  the  projection  of  ip  onto  L2(0,  oo),  with  the  multiplicative  Zak 
transform  and  determine  if  it  is  a  frame  for  that  subspace. 


3.  Evaluate  some  ‘gluing’  condition  for  which  {ip m,n  =  f’mn  +  Vm  n}  is  a  super¬ 
frame  for  L2(R). 


This  presents  two  natural  challenges.  The  first  challenge  is  to  understand 
when  we  can  say  ippnn  is  a  frame  for  L2(0,  oo)  simply  by  analyzing  the  values,  or 
other  properties  of  Z*ip+ ,  even  when  ip  is  not  compactly  supported.  Eventually  we 
would  like  to  solve  the  following  conjecture 

Conjecture  1.  Let  ip  G  L2(R)  satisfy  supppip)  C  (0,  oo),  not  necessarily  compact. 
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Then  {Vvn}  is  a  frame  for  the  closed  subspace  it  spans  if  and  only  if 


3 A,  B  >  0  such  that 


A  < 


Z*fj 


<B, 


a.e.  ( t,uj )  G  [1,  2)2. 


The  second  challenge  is  that  of  finding  a  gluing  condition  that  allows  us  to  put 
together  two  distinct  frames,  match  their  parameters  and  have  a  superframe  for  the 
larger  space.  An  example  of  this  gluing  for  a  specific  function  (Shannon’s  wavelet) 
can  be  found  in  [22]  pages  67  -  73,  along  with  a  collection  of  well-known  estimates 
for  wavelets  frames,  for  which  we  believe  the  above  three  steps  will  be  comparable 
to.  This  problem  is  still  open. 


4.4.2  The  Multiplicative  Zak  Transform  on  Wavelet  Sets 

The  results  of  the  previous  section  are  a  little  unsettling.  The  affine  group 
forces  us  to  analyze  one  side  of  R  at  a  time,  and  to  only  do  so  on  special  sets. 
This  lack  of  freedom  is  in  stark  contrast  to  that  of  the  Gabor  Zak  transform,  and 
therefore  we  wish  to  extend  the  multiplicative  Zak  transform  in  some  way  so  that 
we  can  generate  results  resembling  that  ideal  case.  One  simple  way  of  modifying  the 
multiplicative  Zak  transform  is  by  allowing  ourselves  to  get  away  from  the  target  set 
C  [1,2)  =  [1, 2)  x  [1,  2),  and  allowing  a  more  general  analyzing  set  to  be  CE  —  Ex[  1,  2) 
where  E  is  a  wavelet  set.  An  excellent  introduction  on  the  theory  of  wavelet  sets 
can  be  found  in  Hernandez  and  Weiss  [39]. 

Definition  14.  (Wavelet  Set)  The  set  E  C  R  is  a  wavelet  set  if 

m  =  FT-HE(t) 
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generates  an  orthonormal  wavelet  basis  for  L2  (R) ,  where  ET  1  is  the  inverse  Fourier 
transform. 

Wavelet  frames  have  a  rich  theory  and  a  variety  of  properties;  they  are  a  tool 
for  which  wavelets  and  geometry  can  be  closely  linked.  A  few  key  properties  of 
wavelets  sets  are 

Proposition  5.  Suppose  that  E  is  a  wavelet  set.  Then  the  following  are  true: 

A.  {E  +  n}ne%  partitions  K., 

B.  {2 mE}me%  partitions  3R, 

C.  3  bijection  cte  '■  E  — >•  [—2,  —1)  U  [1,  2),  characterizing  the  wavelets  set. 

Definition  15.  Let  E  be  a  wavelet  set.  Let  6  e  L2(IR),  we  define  the  multiplicative 
E-Zak  transform  to  be 

Z*E9(t,u)  =  J2  ^0(2kt)^k(u) 

kGZ 

where  (t,u)  G  Ce  =  E  x  [1,2). 

The  multiplicative  E- Zak  transform  is  useful  because  it  shares  all  of  the  prop¬ 
erties  of  the  multiplicative  Zak  transform,  but  has  a  lot  more  flexibility  in  the 
star-crossed  fourth  property. 

Proposition  6.  Let  E  be  a  wavelet  set.  Then  the  following  are  true 
-  Z*E  is  a  surjective  isometry  from  L2(R)  to  L2{Ce), 
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PI: 


P2: 


2 HZ*E0)  (2 t,u) 


®o(u) 

$iM 


(z*Ee)  (t,u;), 


2 HZeO)  {t,2u)  =  {Z%6) 


P3: 


\\(Z*M\ 


ldp 


m 


Il2([0,oo))  ' 


Proof:  The  result  follows  from  the  original  multiplicative  Zak  transform  proofs 
combined  with  the  dilation  partition  property  of  the  wavelet  set  E.  □ 

This  gives  us  a  great  multitude  of  multiplicative  Zak  transforms  to  work  with, 
to  analyze  wavelets  with.  We  probably  do  not  need  to  force  the  wavelet  set  structure 
on  ourselves,  since  we  are  really  only  utilizing  the  dilation  partition  property  of  such 
sets.  However,  for  the  time  being,  we  will  continue  to  hold  this  frame  of  reference. 
Also  notice  that  we  are  not  restricted  to  one  half-line  of  R.  Note  that  by  freeing 
ourselves  of  the  static  set  C^)  we  can  attain  the  following  proposition. 

Proposition  7.  Let  if  G  L2(R)  such  that  supp{if)  C  E  where  E  is  some  wavelet 
set.  Then  the  following  property  is  true. 


PI' 


( Z E@m,n)  (t,  Oj) 


$0M 


^n(t) 

*o{t) 


(■ z%e )  (t,  u). 


Proof:  Let  G  Ce ■  Then 


(Z*Eem,n)  (t,u)  =  YJ2^em^2kt)^k{oj)  =  5^252-t 0{2k-mt)e-™n2-mt$k{u). 

fcez  fees 
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Since  t  G  E,  and  since  supp(8 )  C  E,  we  must  have  the  summands  are  zero 


except  when  k  =  m.  Thus  we  see 


^2*2“t  e(2k-mt)e-2nin2k~m%k{u)  =  e-27rint$m(uj)Z*E6{t,uj) 


iiM 


Vn(t) 
* o(t ) 


(Z*E0)  (t,oo). 


□ 


Having  this  fourth  property  of  the  multiplicative  Zak  transform  be  true  for  a 
wider  variety  of  wavelets  than  originally  possible  allows  us  to  resurrect  the  following 
theorem,  which  parallels  a  similar  theorem  in  the  Gabor  Zak  case. 


Theorem  8.  Suppose  that  if  G  L2(R )  such  that  {ifm,n\m,nez  is  a  (Fourier-side) 
wavelet  frame  for  L2(IR)  with  frame  bounds  A,  B  >  0  and  suppfif)  C  E  for  some 
wavelet  set  E.  Then  {ZEgm}n}mnez  is  a  frame  for  L2(CE)  with  the  same  bounds. 
Furthermore,  there  exist  0  <  A0  log2^e^A  <  A^  ,(B  <)  B0  (=  log2(e)£>)  <  oo  such 
that 


An  < 


Z*Eif 


<  B, 


a.e.  for  (t,u>)  G  CE. 


Proof:  Let  F  G  L2(CE).  Since  ZE  is  surjective,  there  is  an  /  G  L2(1R)  such  that 
ZEf  =  F  a.e.  on  CE.  The  following  calculation  justifies  the  first  claim  in  the 
theorem: 


E 

ra,nEZ 


f,z*eu 


L2{Ce) 


|  (^zEf,  zE  (v>TO,n) ) 


E 


/,  i>r 


L2(R) 
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since  we  know  that  for  any  /  G  L2( R), 


A 


-  < 
L2(R)  — 


£ 

m,nSS 


f  i  '0ra,r 


L2(R) 


<  B 


L2(R) 


and  we  know  that  for  all  /  G  L2(R),  1 1/| =  \\^Ef\^L2{cEV  we  can  conclude  that 


A\\F\ 


I L2(CE)  —  I  \^F,ZEyPm,r 

m,n£'Z 


<B\\F\ 


L2{CE) 


(4.2) 


for  all  F  G  L2(Ce)-  This  means  that  {Z*E  (VVnjjm.  inez  is  a  frame  for  L2(Ce)  with 
bounds  A,  B  >  0. 


To  show  the  second  conclusion  of  the  theorem,  consider  again  the  inequality 
in  (4.2).  Note  that 


F,Z*e  Wn  )  =  (F,Z*V>- 


•  Vhn 


—  (  F  ■ 


$0-T0 


Tm  •  Tn 


(4.3) 


Here  the  *  outside  of  the  parenthesis  represents  the  complex  conjugate.  Since  we 
know  that  {<hm(u;)  ■  '&n(t)}m,nez  Is  an  ONB  of  L2(Ce),  combining  (4.2)  with  (4.3) 
yields: 


A\\F\ 


L2(Ce)  - 


F  ■ 


Z*E ^ 


<f>  •  T 

^  O  1  O 


£ 

m.n^TL 


F-  I  I  ,J>ra.$, 


o  ^  o 


for  all  F  G  L2(Ce)-  This  shows  that 


inf 

{t,w)(E.CE 


sup 

(£,cj)eC£: 


L2(Ce) 


>A, 


<  B. 


l2(ce) 


<B\\F\ 


L2(Ce) 
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Since  \l/0  =  1  and  since  <3>0(u;)  =  \  1°S2^e-> ,  we  have  that 


- — —  ■  inf  ZfrJj  2  >  inf  ^Ei'  >  ^ 
log 2 ( C )  (t,u)£Cj:  (4,cz)GCe  $0  ' 


1  ^  r 2  /  ze^  /  D 

- - —  ■  sup  <  sup  — - —  <  B. 

I°g2(e)  {t,u)eCE  (t,u>)eCE  ®o-*o 

This  means  that 


Z  (  =  '°S^e)j4)  <  |Z^|2  <  (loga(e)B  =)BC 


a.e.  (t,u)  G  Ce ■  □ 

This  result  immediately  lets  us  write  down  a  theorem  that  is  a  modification 
of  the  conjecture  in  the  previous  section. 

Theorem  9.  Let  if  G  L2(M)  satisfy  suppifr)  C  E  for  some  wavelet  set  E.  Then 
{f/V«J  a  frame  for  L2(R)  if  and  only  if 


3A,  B  >  0  such  that  A  <  Z*Eif  <  B ,  a.e.  (t,u)  G  Ce- 


Note  a  theorem  which  resembles  the  above  result,  in  a  slightly  different  context. 
This  is  taken  from  [22],  page  973. 

Theorem  10.  If  the  ifm,n(t )  =  2^ if  (2 mt  —  n )  for  m,n  G  Z,  is  a  frame  for  L'2( R) 
with  bounds  A,B>  0,  then 


n2  f00  1^(7)  |2  In  2 

tatL  *7 
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This  shares  the  bounds  with  our  frame  test,  and  also  the  ability  for  a  frame  on 
one  half,  and  a  frame  on  another,  to  not  necessarily  combine  to  form  a  superframe 
for  the  whole  set. 

Note:  Unfortunately,  while  this  theorem  seems  like  a  large  improvement  over 
the  conjectures  and  theorems  of  Gertner  and  Tolimieri,  we  must  point  out  that 
in  the  cases  we  are  working  with,  that  is,  "0  C  E  where  E  is  a  wavelet  set,  the 
multiplicative  E- Zak  transform  is  identity  on  such  functions.  That  means  studying 
the  wavelet  generated  by  0  G  L2(1R)  for  which  snpp('ij))  C  E  for  a  wavelet  set  E, 
amounts  to  simply  checking  that  |0|2  is  bounded  above  and  below.  This  is  a  special 
case  from  [20],  problem  12.1  on  page  281.  With  this  in  mind,  we  wish  to  test  part  of 
this  multiplicative  Zak  transform  to  see  if  it  can  detect  the  simplest  kinds  of  wavelet 
frames.  Namely,  those  that  are  orthonormal  bases  for  L2(R).  That  means  we  should 
be  able  to  detect  the  following  simple  conditions  for  an  orthonormal  wavelet  basis, 
since  these  conditions  partially  classify  such  collections  of  functions.  For  the  rest 
of  this  chapter,  we  will  switch  the  use  of  the  variable  t  in  the  multiplicative  Zak 
transform,  to  7  to  make  the  Fourier  transforms  looks  more  natural. 


1^(2^) 

jew, 


1  a.e., 


(4.4) 


^0(2J7)0(2%  +  /))  =  0  for  Z  G  2Z  +  1.  (4.5) 

3= 1 

To  deal  with  the  first  of  these  characteristic  formulas,  we  define  the  alternate 
multiplicative  E- Zak  transform. 
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Definition  16.  We  define  the  alternative  multiplicative  Zak  transform  to  be 

Z*e9(  7,o;)  =  ^2-^(2fe7)$fc(a;), 

k&L 

where  ( t,u> )  G  Ce- 

Note  that  in  general,  we  would  not  expect  such  a  series  to  converge.  Consider 
the  following  formal  calculation 


Z*eQ{i,u)  ze°(  duj  = 


9(2k^k(u 


'  29(2l'y)$i(u)  du 


=  YJ2^0(2k1)6(2i1)  f\k{u>)  ^(u)(kj  =  J2  |0(2fc7)|2- 

k£% 

This  ‘Parseval-like’  equality  seems  to  show  that  the  information  contained  in 
equation  4.4  can  be  replicated  with  the  multiplicative  Zak  transform  and  its  alternate 
form.  We  can  make  this  precise  with  the  following. 


Theorem  11.  Let  if  G  L2(1R).  Then 


Z*E0( i,u)  Z*e9(i,u)  du 


almost  everywhere  7  G  E. 


kez 


Proof:  For  N  G  N,  we  define  the  function 

N 

Fn( 7,o>)  =  ^2  2-^(2fc7 )$fc(u;). 

k=-N 

Clearly  Fn  G  L2(E  x  [1,2))  for  every  N  G  N.  Since  G  L2(1R),  we  know  that 
Z*fi>  G  L2(E  x  [1,2)).  We  can  then  compute 
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N 


Z*E^(-f,u)FN(-f,uj)  du  =  EE  2  2  "0(2  t)'*/’ (2*7)  /  $fcM$j(w)  du 


fees  «=-iV 


=  £  1?(2fc7) 

fc=-7V 

for  every  JVeN.  □ 

Proposition  8.  Suppose  that  supp{% h)  fl  (—77,77)  =  0  /or  some  77  >  0.  Furthermore, 
assume  that  fi  e  L°°(R)  /or  some  mild  decay  towards  infinity).  Then  {i/v(7,cu)}  is 
a  Cauchy  sequence  arid  FE  — >  Z*Ei[)  pointwise  in  Ce- 

Proof:  Suppose  that  N  >  M  and  consider 


Assuming  that  our  wavelet  set  E  is  not  pathological,  we  can  safely  assume  that 


there  exists  a  K  e  N  such  that  if  k  >  K,2  kE  C  (—77,77)  and  consequently, 
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2-fc/U(2fc7)  d1+  Y.  2"fc/U(2fc7)  d1 


k=-M- 1 


k=M+ 1 


2~k  /  U(2fc7)  d7  <  E  2  /  |^(2^)  ^  <  e‘ 


k=M+ 1 


fe=M+l 


What  are  the  implications  of  such  a  formalized  calculation?  Suppose  that  9 
was  the  Fourier  transform  of  % \  and  {0m.n}  generated  an  orthonormal  basis  for 
L2(IR).  We  know  that  if  0  satishes  equation  4.4,  then  we  have  that 


/  ^*0(7,  w)  ^*0(7,  w)  dw  =  ^  |0(2fc7)  =  1. 

J1  fees 

Since  we  already  know  that  Z*0  is  well  defined  and  in  fact  L2([l,2)2),  this 
does  say  something  about  the  values  of  the  alternative  multiplicative  Zak  transform. 

The  following  is  a  formal  attempt  at  trying  to  find  the  information  contained 
in  equation  4.5  from  the  multiplicative  Zak  transform  and  its  alternate  version.  Let 
q  G  2Z  +  1  and  7  G  E.  Since  E  is  a  wavelet  set,  7 +q  0  E.  Our  periodicity  conditions 
in  the  multiplicative  Zak  transform  say  nothing  about  additive  periodicity.  However, 
there  exists  a  unique  l  G  Z  and  7 q  E  E  for  which  7  +  q  =  2l^q.  In  this  way  we  can 
calculate  the  formal  inner  product 


^00(7,  w)Z00(7  +  w)  du 


^2l0(2fc7)$fc(a;)  K^2-f0(2-(7  +  g))$m(u;)  du 
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E  2fe2m^(2fc7)^(2m(7  +  q))  /  $fc(o;)$m(a;)  du  =  ^'tp(2k'y)'ip(2k('y  +  q)). 


k,m.£Z 


We  can  also  write  the  expression  as 


Z*'ip( 7,  u)Z*^( 7  +  q,  u)  du  =  /  7,  u)Z*^>{ 2l'yq,  u)  du 


r2  ;  $  - 

=  /  Z*'ip{ 7,w)2  ^-Z—Z*^{jq,u)dw 

using  our  algebraic  manipulation  7  +  q  =  2*79  and  the  periodicity  of  the  multiplica¬ 
tive  Zak  transform.  We  can  take  this  extra  term,  2~^  and  note  that  in  the 
case  of  our  {47.},  this  can  be  rewritten  as  2~s  ,  which  will  allow  us  to  use  the 


periodicity  of  Z*  to  write 


/  Z*i/>(y,u)2  2^Mz*^(jq,u)  du  ^  2  1  [  Z*^( 2  l'j,u)Z*'ip(~/q,u)  du. 
'1  fzfwj  Ji 


4.4.3  Inversion  Formula 


Consider  the  Multiplicative  E-Zak  transform  on  the  function  ^  G  L2(1R).  Note 
that  since  the  inner  product  on  L2(]R)  is  defined  for  Schwartz  functions  and  since 
the  multiplicative  IN  Zak  transform  is  an  isometry  onto  L2(Ce)  then  we  can  extend 
the  multiplicative  E-Zak  transform  to  act  on  tempered  distributions.  Given  this 
fact,  for  7  G  R  we  have 


^(7)=  /  ^(s)Sy(s)  ds  =  =  (Z*e^,Z*e5. 
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Therefore,  to  compute  the  value  of  ^(7),  it  is  imperative  to  calculate  Z*ES7 
that 

(Z*ES7)  (t,u)  =  ]T2“^7(2fcf)$fc(u;)  =  2-f57(2mt)$m(cj), 


kez 


where  7  G  2 mE.  Given  this  we  can  calculate 


^(7)  =  (Z*Eil},Z*E87)  =  /  /  Z*E^(t,  u)2  2  57(2mt)$m(uj)  dt  du 

J 1  Je 


7 


=  22  J  Z*Eif{—,u)^m{u)du. 


This  creative  formula  leads  to  the  following  inversion  theorem. 


Theorem  12.  Let  F  G  L2(CE).  Define  the  function 


0(7)  =  (2  *  [  F(  2  k'j,u)^k(u)  du\  t2kE(l)- 

k£%  \  dl  J 


Then  9  G  L2(R)  and 


Z*E0  =  F  a.e.  ( t,u )  G  Cf 


Proof:  Let  6  be  defined  as  above.  Then  we  can  write  the  norm 


1 1*1  IV) 


|0(7)l 2dT  =  J2  \d(l)\2  d'y 


2  kE 


E 

fcez  ' 


r2  kE 


2  2  J  F( 72  cc)$fc(u;)  du 

-fc_ 


d'y. 


For  each  k  G  Z,  we  substitute  u  =  2  *7  and  obtain 


E 

fcez ' 


F(w,  o;)<f)fc(a;)  dec 


cLy. 


.  Note 
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Note  that  for  a  fixed  uQ  G  E,  we  can  define 


GUo(u)  =  ^2  (  /  F(u0,uj)$k(uj)  duj  $k(u). 

Since  the  collection  {$*.(0;)}  is  an  orthonormal  basis  for  the  space  L2([  1,2)),  we 
have  that  the  norm  of  GUo(u)  is  given  by 


IIG, 


E 


F(u0,uj)$k(uj)  du 


This  sum  clearly  converges  for  each  uQ  G  [1,  2)  and  thus  the  sum  converges  uniformly 
on  the  interval  [1,2).  But  this  means  that 

II^II2  =  II^IIW.2)“). 


e  G  L2(R). 

Now  we  wish  to  calculate 


Z*Ee(t,u)  =  ^2^(2fct)$fc(u;)  =  (2_l  / 

fcez  fcez  ^  ■’ 1 

=e(  f  F(t,S)M^ds)$ 

fces  ' 


F{(2kt)2~k,s)$k(s)ds  $fc(cu 


k(u)  =  Gt[u)  =  F(t,  u) 


for  almost  all  (t,ui)  G  Ce- 


Example:  Consider  the  function  F(t,u)  =  y°s^  on  Cy  i.  2).  Inverting  yields 

0(l)  =  l[i,2)(7)> 

where  Z*9{t,ui )  =  F(t,u )  almost  everywhere  on  C[ij2). 

Example:  Now  let  9  be  the  Mexican  hat  function.  That  is,  9(j)  =  72e-^.  Then 
we  can  estimate  the  values  of  \Z*9(t,co)\2  on  Gy i)2)  fl  [1, 1.5)2  by 
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Figure  4.1:  Estimate  of  the  MZT  of  the  MHF. 
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